Fourier transform

I want to apply rfft to an image tensor of shape (60, 1, 256, 256). After the torch.rfft(img, signal_ndim=2) operation, the size of the output tensor is (60, 1, 256, 129, 2). Can someone please elaborate on this new output size?

Thanks in advance.

Hi there! I’m going to make some assumptions about what your input dimensions represent and then tell you what the output represents based on those assumptions. I say this because I typically use the librosa library (highly recommended) for audio processing and have never used PyTorch’s rfft. However, they should be almost identical:

Assumptions about your input:

  • 60 - Batch Size
  • 1 - Number of audio channels (mono in your case?)
  • 256 - Amplitude
  • 256 - Discrete-time samples

Output based on those assumptions:

  • 60 - Batch size!
  • 1 - The number of channels (mono)
  • 256 - The amplitude scale of the DFT
  • 129 - The Frequency scale of the DFT
  • 2 - Probably corresponds to signal_ndim = 2

A little explanation:
For real valued inputs, a DFT typically contains redundant information in the second half. Because of this rfft has a onesided flag that is set to True by default. What this means is that the returned DFT will be cut in half, having size N/2 + 1, which in your case appears to be 256/2 + 1 = 129.

For more information:

I’d highly recommend researching DFT’s as a general topic before trying to use them in your code. Some topics to google are Nyquist-Shannon Sampling Theory, Sampling Rate vs Bandwidth (related to the first) and Discrete Fourier Transforms. fft is just a “fast” way to compute a DFT, so it’s helpful to first understand what a DFT is theoretically.

If you have any more specific questions, please let me know. I will try to answer them if I possibly can. I’m not an expert, but I’m an enthusiast.

Thank you for your detailed response @Wesley_Neill. This gives me a good context to go deeper into the topic.