# Fourier transform

I want to apply rfft to an image tensor of shape (60, 1, 256, 256). After the torch.rfft(img, signal_ndim=2) operation, the size of the output tensor is (60, 1, 256, 129, 2). Can someone please elaborate on this new output size?

Hi there! I’m going to make some assumptions about what your input dimensions represent and then tell you what the output represents based on those assumptions. I say this because I typically use the `librosa` library (highly recommended) for audio processing and have never used PyTorch’s `rfft`. However, they should be almost identical:

• 60 - Batch Size
• 1 - Number of audio channels (mono in your case?)
• 256 - Amplitude
• 256 - Discrete-time samples

Output based on those assumptions:

• 60 - Batch size!
• 1 - The number of channels (mono)
• 256 - The amplitude scale of the DFT
• 129 - The Frequency scale of the DFT
• 2 - Probably corresponds to `signal_ndim = 2`

A little explanation:
For real valued inputs, a DFT typically contains redundant information in the second half. Because of this `rfft` has a `onesided` flag that is set to `True` by default. What this means is that the returned DFT will be cut in half, having size `N/2 + 1`, which in your case appears to be `256/2 + 1 = 129`.

I’d highly recommend researching DFT’s as a general topic before trying to use them in your code. Some topics to google are Nyquist-Shannon Sampling Theory, Sampling Rate vs Bandwidth (related to the first) and Discrete Fourier Transforms. `fft` is just a “fast” way to compute a DFT, so it’s helpful to first understand what a DFT is theoretically.