Torchaudio Spectrogram returning some data points as 0 and Log2 inf

Giuseppe_Sarno · January 13, 2020, 9:26pm

Hello,
I followed the tutorial at https://pytorch.org/tutorials/beginner/audio_preprocessing_tutorial.html
and applied log2 to the spectrogram. However one of my mp3 files is as such the Spectrogram returns data = 0. This makes the log2 to return inf.
Is this expected ? or is there something it can be done to avoid this scenario?

Thanks.

vincentqb · January 13, 2020, 11:31pm

This is expected. Do you need to apply the log transformation? If so, you can also add a little value before taking log:

epsilon = 1e-6
log2(epsilon + specgram)

Giuseppe_Sarno · January 14, 2020, 11:12pm

Thank you, will need to do something like this.

The odd problem I have is the following:

I am converting / comparing 2 sound files. One WAV and the other the MP3 of the same file then create the spectrogram and convert to image.
I get the issue with the MP3 version of the file while everything is ok for WAV (noticed this consistently with other audios).
The other key difference is that because the first data points for each row are very small compared to the others when I normalize the pictures to RGB (0 -255) the image is very faint. While the pictures generated with a Wav file is good. I normalize the waveform, spectrum and the image.
Are there techniques to avoid this problem ? any particular reason why the MP3 version has this different behavior ?

Thanks,

vincentqb · January 15, 2020, 5:47pm

Can you provide a minimal code and files to reproduce? Is the mp3 converted from the wav or vice-versa?

Giuseppe_Sarno · January 16, 2020, 10:51pm

Hi vincentqb,
I created a new topic as the problem is now different to the question on this topic spectrogram-to-rgb-pictures-for-resnet-faint-image-with-mp3-files