CosAE: Learnable Fourier Series for Image Restoration

(sifeiliu.net)

69 points | by E-Reverance 245 days ago

8 comments

maxbond 244 days ago
I've been dabbling in using Fourier analysis in deep learning lately, and I'm surprised it that I haven't turned up very much research in this area (Fourier Neural Operators being what seems to be the biggest exception). Fourier analysis is such a ubiquitous tool, intuitively I'd think it would work great for deep learning. My suspicion has been that complex numbers are difficult to work with, and maybe I'm just bad at surfacing the relevant research, but I'd be interested to hear from those better informed. (My naive approach has been to simply concatenate the real and complex components together into an n+1 dimensional tensor, but surely there's a way that better respects the structure of complex numbers.)
[-]
- nialse 244 days ago
  Limited intuitive interpretability of phase likely restricts the broader use of discrete Fourier transforms in machine learning. Frequency, time, and amplitude are tangible and intuitive concepts, whereas phase often feels awkward and less accessible. Using a power spectrum is common practice, but it comes at the cost of losing precision.
- Scene_Cast2 244 days ago
  RoPE is somewhat related, I think, and it's pretty popular.
  There's also 2D rope for ViT, but I don't know how it works exactly.
- smus 244 days ago
  Convolutional neural networks are pretty big
nullc 244 days ago
Might be useful to use gabor filters as the basis function, since just 2d cosine filters doesn't produce particularly sparse output for angled features. The additional overcompleteness would probably be helpful for the NN learning.
[-]
- EMIRELADERO 244 days ago
  A fun little bit of trivia: Mammalian brains implement Gabor filters in the primary visual cortex (V1), as the first step of the visual processing pipeline.
sorenjan 245 days ago
These results look incredible, and with an inference time of only 36 ms for a 4X super resolution on a V100.
[-]
- E-Reverance 244 days ago
  They should make a temporally coherent version of CosAE to replace this: https://blogs.nvidia.com/blog/rtx-video-super-resolution/
syockit 244 days ago
I don't know why but I get this uncanny feeling when looking at the restored images. Maybe it's because I know it is restored, I wonder if I'd feel the same way if I find it in the wilds.
gitroom 244 days ago
Been messing with this stuff too so I get the struggle. Cool results but man, waiting on code drops always drives me nuts.
PaulRobinson 244 days ago
Wait, all my eye-rolling at the TV/film trope of "Computer, Enhance!" de-blurring is now redundant, and that stuff is real?!
This looks incredibly impressive as a result, but I'm wary of the use of metrics like FID to evaluate performance. I can take a high-res image, downsample it, then use the method and measure performance very easily: what percentage of pixels were correctly restored? Instead they're using metrics like FID which - while useful for purely generative techniques - seem a little vague for this purpose.
[-]
- ted_dunning 243 days ago
  Notice the 4x super resolution example they gave for some text. The result is completely illegible even though it looks kind of like text.
  [-]
  - maxbond 242 days ago
    The data processing inequality holds regardless of how many layers are in your neural net (processing data does not increase it's information content). You can impute missing data, and with something very regular text it could work pretty well, but that way lies hallucination.
doctorpangloss 244 days ago
Autoencoders are catching up. Next: luminosity separated from color and UCS.

dingdingdang 244 days ago

No code has been released though?

[-]

sorenjan 244 days ago

That's addressed in the paper:

  Open access to data and code
  Question: Does the paper provide open access to the data and code, with sufficient instruc-
  tions to faithfully reproduce the main experimental results, as described in supplemental
  material?
  Answer: [No]
  Justification: Although we have answered “No” for now, we intend to release the code and
  models to enable the reproducibility of our main experimental results, pending approval
  from the legal department. This temporary status reflects our commitment to open access
  once all necessary permissions are secured.

[-]

GaggiX 244 days ago
The paper was released a few months ago for context.