Reshaping the MFCC

HitsujiAura · April 24, 2021, 9:04pm

Hello,

I used torchaudio to extract the MFCC from an audio so that it has the same amount of frames as the corresponding video I extracted - 525. However, the shape of the MFCC is (1, 20, 525).
How can I reshape the data to get a dimension of (525, 20). So that I can manipulate it alongside the video frames?

Thank you for your help

JuanFMontesinos · April 25, 2021, 12:25pm

You can use permute so that if you have M a tensor of shape (1,20,525) you can reach a tensor of shape (525,20) by doing
M[0].permute(0,1)