Hi, I noticed there is a difference in the values from mp3 file when loaded using
librosa.load. Also, the shapes of the tensors are different. I am loading an mp3 file with 44.1kHz sampling frequency of 1 sec. duration and I am getting the following output.
librosa_audio, sr_librosa = librosa.load(os.path.join(root, path), sr=44100) torch_audio, sr_torch = torchaudio.load(os.path.join(root, path)) print(librosa_audio.shape, sr_librosa) print(torch_audio.shape, sr_torch) # (44100,) 44100 # torch.Size([1, 46040]) 44100
I am loading a one second audio and I expect the shape to be 44100 for both the case. Can someone please explain what is happening here?