Frequencies and time arays for spectrogram

sammlapp · September 11, 2024, 9:11pm

When we use the torchaudio.transforms.spectrogram transform we get a 2d array with dimensions corresponding to frequency and time. The exact frequencies and times of rows and columns depend on the spectrogram parameters.

The APIs for librosa and scipy.signal either directly return these values along with the spectrogram or provide other means of getting these values. Without them, interpretation of the output is difficult, and users are likely to try to manually create the time and frequency series - but are likely to be guessing at edge cases like how the window functions behave at the beginning and end of the audio signal - which will result in off-by-one and similar errors.

I was going to open this as a feature request on the torchaudio issues page, but there is a note saying that the issues are no longer monitored. Please let me know if torchaudio is deprecated or if there is a better place to post this question. Thanks

ptrblck · September 12, 2024, 1:34pm

Could you link to this note here, please?

sammlapp · October 8, 2024, 4:45pm

oops sorry for delay. For example if I go to create an issue:

There is all-caps bold text below the Add a Title field:

PLEASE NOTE THAT THE TORCHAUDIO REPOSITORY IS NO LONGER ACTIVELY MONITORED. You will not likely get a response. For open discussions, visit https://discuss.pytorch.org/.

ptrblck · October 8, 2024, 8:41pm

Thanks for sharing this message!
Let me add @moto to help with this question if possible.