When we use the torchaudio.transforms.spectrogram transform we get a 2d array with dimensions corresponding to frequency and time. The exact frequencies and times of rows and columns depend on the spectrogram parameters.
The APIs for librosa and scipy.signal either directly return these values along with the spectrogram or provide other means of getting these values. Without them, interpretation of the output is difficult, and users are likely to try to manually create the time and frequency series - but are likely to be guessing at edge cases like how the window functions behave at the beginning and end of the audio signal - which will result in off-by-one and similar errors.
I was going to open this as a feature request on the torchaudio issues page, but there is a note saying that the issues are no longer monitored. Please let me know if torchaudio is deprecated or if there is a better place to post this question. Thanks