Difficulty shaping 3 dimension tensors from MFCC transform for input into model


I am using the LibreSpeech dataset and the MFCC transform from torchaudio to compute the MFCC coeffecients for each waveform.

The MFCC transform returns a tensor with the following shape:

1 x 24 x N

1 = the waveform
24 = the number of MFCC coefficients
N = the number of time samples (I think?)

I then squeeze this and transpose it so it is now:

N * 24

During training I have tensors of shape:

5 * N * 24

Where 5 is the batch size I am using

How do I reshape this tensor during training or collation so that I can feed it into my Neural net which accepts 24 input_size and outputs 28 (my number of classes).

Many thanks!

Hi @Brennan, what’s the model structure of your neural net? If it’s composed of pure Linear layers, you can move the MFCC dimension to the last.
Suppose your input Tensor X is of shape [5, 24, N], you can swap the axis of 24 and N by

X = X.transpose(1, 2)

Then you feed X to your neural net, the output Tensor will be of shape [5, N, 28].

If your neural net is different, you can share the details and I’m happy to help :slight_smile: