I’ve been using this script:
spgram = torchaudio.transforms.Spectrogram(512, hop_length=32) audio = spgram(audio)
to get the spectrogram of some stereo music audio. I expected that the resulting spectrogram has the shape [2, 257, audio.shape/32] However, that’s not the case. For examples, an audio clip with size [2, 199488] (with sr=24576) yields a spectrogram with size [2, 257, 6241] (note that 199488/32=6234). Why is that? and how can I convert from frame location to sample location?