Hi,
I’m little confused about the center
parameter of MelSpectrogram
in torchaudio.
I read the document of torchaudio.MelSpectrogram
at here
It says that if center=True
, then the t th frame will be centered at time t x hop_length in the original audio.
Then I have the following exp:
import torch
import torchaudio
mel_transform = torchaudio.transforms.MelSpectrogram(sample_rate=24000, n_fft=1200, hop_length=300, n_mels=80, center=True, norm='slaney', mel_scale='slaney')
audio = torch.randn(10, 24000)
mel = mel_transform(audio)
print(mel.shape)
I get torch.Size([10, 80, 81])
.
I think this is weird since the 81th frame, based on the document, should be center at 81 x 300 = 24300, but the time step 24300 even doesn’t exist in the original audio.
So I’m very curious about how this happened.