Batch of audio with unequal length and transformations

Superklez · August 3, 2021, 9:52am

I understand that when dealing with audio with unequal length, one could define a collate function that would pad all audio to the same length, i.e. the length of the longest audio. However, if I were to do some transformations in my custom dataset’s __getitem__, like taking the log-Mel spectrogram, how do I pad audio of unequal length? My guess is to still use a collate function and pad along the last dimension of the batch of audio transformed to a log-Mel spectrogram, but I want to know what the best practices are with regards to this matter. Thanks!

shivammehta007 · August 3, 2021, 10:41am

What you mentioned is the best practice as per my knowledge and referring to the Nvidia’s Tacotron2 source code Tacotron2/data_utils.py the same is done here.