Batch of audio with unequal length and transformations

I understand that when dealing with audio with unequal length, one could define a collate function that would pad all audio to the same length, i.e. the length of the longest audio. However, if I were to do some transformations in my custom dataset’s __getitem__, like taking the log-Mel spectrogram, how do I pad audio of unequal length? My guess is to still use a collate function and pad along the last dimension of the batch of audio transformed to a log-Mel spectrogram, but I want to know what the best practices are with regards to this matter. Thanks!

What you mentioned is the best practice as per my knowledge and referring to the Nvidia’s Tacotron2 source code Tacotron2/ the same is done here.