from torchaudio.transforms import Spectrogram
spec = Spectrogram(hop_length=160)
a = to.randn(16000)
f = spec(a)
f.shape # (201, 101,)
Why are 101 frames? The padding argument is 0 by default so there should be 98. This is what fbank does
from torchaudio.compliance.kaldi import fbank
m = fbank(a.unsqueeze(0))
m.shape # (98, 23,)
krishna511
(krishna Chauhan)
January 6, 2021, 1:41pm
2
@divinho I am not sure about this , but I think this extra 1 frame is to handle the zero hop case for the first frame. atleast in case of librosa feature extraction of mel spectrogram they use center frames for hops.
1 Like
Thank you for the quick reply, I don’t understand what you mean by zero hop case though?
krishna511
(krishna Chauhan)
January 6, 2021, 1:52pm
4
It means for the first frame the hop-length is zero. I wish someone experienced will comment soon
I see, yeah it seems the culprit is torch.stft
: audio/functional.py at 6b07bcf80fafd77cb8bee32c316ce8b55323b868 · pytorch/audio · GitHub
Which has the center
argumented hardcoded to True. That’s a bit annoying.