Hi, I noticed there is a difference in the values from mp3 file when loaded using torchaudio.load
vs librosa.load
. Also, the shapes of the tensors are different. I am loading an mp3 file with 44.1kHz sampling frequency of 1 sec. duration and I am getting the following output.
librosa_audio, sr_librosa = librosa.load(os.path.join(root, path), sr=44100)
torch_audio, sr_torch = torchaudio.load(os.path.join(root, path))
print(librosa_audio.shape, sr_librosa)
print(torch_audio.shape, sr_torch)
# (44100,) 44100
# torch.Size([1, 46040]) 44100
I am loading a one second audio and I expect the shape to be 44100 for both the case. Can someone please explain what is happening here?
Thanks.