Fourier transform

saandeep_aathreya · August 30, 2020, 10:03pm

I want to apply rfft to an image tensor of shape (60, 1, 256, 256). After the torch.rfft(img, signal_ndim=2) operation, the size of the output tensor is (60, 1, 256, 129, 2). Can someone please elaborate on this new output size?

Thanks in advance.

Wesley_Neill · August 31, 2020, 2:47am

Hi there! I’m going to make some assumptions about what your input dimensions represent and then tell you what the output represents based on those assumptions. I say this because I typically use the librosa library (highly recommended) for audio processing and have never used PyTorch’s rfft. However, they should be almost identical:

Assumptions about your input:

60 - Batch Size
1 - Number of audio channels (mono in your case?)
256 - Amplitude
256 - Discrete-time samples

Output based on those assumptions:

60 - Batch size!
1 - The number of channels (mono)
256 - The amplitude scale of the DFT
129 - The Frequency scale of the DFT
2 - Probably corresponds to signal_ndim = 2

A little explanation:
For real valued inputs, a DFT typically contains redundant information in the second half. Because of this rfft has a onesided flag that is set to True by default. What this means is that the returned DFT will be cut in half, having size N/2 + 1, which in your case appears to be 256/2 + 1 = 129.

For more information:

I’d highly recommend researching DFT’s as a general topic before trying to use them in your code. Some topics to google are Nyquist-Shannon Sampling Theory, Sampling Rate vs Bandwidth (related to the first) and Discrete Fourier Transforms. fft is just a “fast” way to compute a DFT, so it’s helpful to first understand what a DFT is theoretically.

If you have any more specific questions, please let me know. I will try to answer them if I possibly can. I’m not an expert, but I’m an enthusiast.

saandeep_aathreya · September 2, 2020, 4:39pm

Thank you for your detailed response @Wesley_Neill. This gives me a good context to go deeper into the topic.