Torchaudio: Wrong number of timesteps returned from Spectrogram?

from torchaudio.transforms import Spectrogram
spec = Spectrogram(hop_length=160)
a = to.randn(16000)
f = spec(a)
f.shape  # (201, 101,)

Why are 101 frames? The padding argument is 0 by default so there should be 98. This is what fbank does

from torchaudio.compliance.kaldi import fbank
m = fbank(a.unsqueeze(0))
m.shape  # (98, 23,)

@divinho I am not sure about this , but I think this extra 1 frame is to handle the zero hop case for the first frame. atleast in case of librosa feature extraction of mel spectrogram they use center frames for hops.

Thank you for the quick reply, I don’t understand what you mean by zero hop case though?

It means for the first frame the hop-length is zero. I wish someone experienced will comment soon :slight_smile:

I see, yeah it seems the culprit is torch.stft: audio/ at 6b07bcf80fafd77cb8bee32c316ce8b55323b868 · pytorch/audio · GitHub

Which has the center argumented hardcoded to True. That’s a bit annoying.