Torchaudio: Wrong number of timesteps returned from Spectrogram?

divinho · January 6, 2021, 1:31pm

from torchaudio.transforms import Spectrogram
spec = Spectrogram(hop_length=160)
a = to.randn(16000)
f = spec(a)
f.shape  # (201, 101,)

Why are 101 frames? The padding argument is 0 by default so there should be 98. This is what fbank does

from torchaudio.compliance.kaldi import fbank
m = fbank(a.unsqueeze(0))
m.shape  # (98, 23,)

krishna511 · January 6, 2021, 1:41pm

@divinho I am not sure about this , but I think this extra 1 frame is to handle the zero hop case for the first frame. atleast in case of librosa feature extraction of mel spectrogram they use center frames for hops.

divinho · January 6, 2021, 1:44pm

Thank you for the quick reply, I don’t understand what you mean by zero hop case though?

krishna511 · January 6, 2021, 1:52pm

It means for the first frame the hop-length is zero. I wish someone experienced will comment soon

divinho · January 6, 2021, 2:15pm

I see, yeah it seems the culprit is torch.stft: audio/functional.py at 6b07bcf80fafd77cb8bee32c316ce8b55323b868 · pytorch/audio · GitHub

Which has the center argumented hardcoded to True. That’s a bit annoying.