Loading mp3 file using torchload

kranti · October 5, 2021, 9:25pm

Hi, I noticed there is a difference in the values from mp3 file when loaded using torchaudio.load vs librosa.load. Also, the shapes of the tensors are different. I am loading an mp3 file with 44.1kHz sampling frequency of 1 sec. duration and I am getting the following output.

librosa_audio, sr_librosa = librosa.load(os.path.join(root, path), sr=44100)
torch_audio, sr_torch = torchaudio.load(os.path.join(root, path))
print(librosa_audio.shape, sr_librosa)
print(torch_audio.shape, sr_torch)
# (44100,) 44100        
# torch.Size([1, 46040]) 44100

I am loading a one second audio and I expect the shape to be 44100 for both the case. Can someone please explain what is happening here?

Thanks.

ptrblck · October 6, 2021, 6:49am

I think you might be hitting this issue so feel free to comment on this issue with your use case and description of the difference.

kranti · October 8, 2021, 7:35pm

Thanks. Yes, I am indeed hitting the same issues. It seems there is some issue with mp3 loading.