A question about the center parameter of torchaudio.MelSpectrogram

Alethia · August 6, 2023, 2:30pm

Hi,

I’m little confused about the center parameter of MelSpectrogram in torchaudio.

I read the document of torchaudio.MelSpectrogram at here

It says that if center=True, then the t th frame will be centered at time t x hop_length in the original audio.

Then I have the following exp:

import torch
import torchaudio

mel_transform = torchaudio.transforms.MelSpectrogram(sample_rate=24000, n_fft=1200, hop_length=300, n_mels=80, center=True, norm='slaney', mel_scale='slaney')
audio = torch.randn(10, 24000)
mel = mel_transform(audio)
print(mel.shape)

I get torch.Size([10, 80, 81]).
I think this is weird since the 81th frame, based on the document, should be center at 81 x 300 = 24300, but the time step 24300 even doesn’t exist in the original audio.
So I’m very curious about how this happened.