Can anyone point out the error in dimensions of permute?

This is the model I m referring

class CNN_spec(torch.nn.Module):
     def __init__(self, num_classes=7):
        super(CNN_spec, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(1, 64, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(64, 64, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer3 = nn.Sequential(
            nn.Conv2d(64, 128, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer4 = nn.Sequential(
            nn.Conv2d(128, 128, kernel_size=3,stride=1),
            nn.MaxPool2d(kernel_size=4, stride=4))
        self.layer5 = nn.LSTM(128,1000 )
        self.emotion_layer = nn.Linear(2000,num_classes)

     def forward(self,inputs): 
         out = self.layer1(inputs)
         out = self.layer2(out)
         out = self.layer3(out)
         out = self.layer4(out)
         out = out.permute(0, 2, 1)
         out, (final_hidden_state, final_cell_state) = self.layer5(out)

        #out = out[:, -1, :].reshape(out.shape[0], 1, out.shape[2])

         mean = torch.mean(out, 1)
         std = torch.std(out, 1)
         stat =, std), 1)
         pred_emo = self.emotion_layer(stat)
         return pred_emo

The error is : number of dims don’t match in permute.
Any suggestions


out from self.layer4 is 4D tensor in shape [batch, channel, h, w]. But you are refering to 3D tensor using your permute.

I am not sure about your idea, but something like out.permute(0, 2, 3, 1) will work.

PS. It would be much easier to debug if you print whole stack trace of error.
Also, documentation is clear about majority of modules’ behavior. Might need to check it.


Ok let me try that.
Thank you

Actually this dimesion error is due to size mismatch, lstm can take input in 3 dim , here this 2dcnn is giving (batch_size,no of channels, m,n), where mxn is the size of spectroogram, reducing at each conv layer. If it was 1dcnn then permute (0,2,1) is okay, so I think i need to read this spectrogram row wise to input lstm layer, how to rehsape it like that?
[batch_size,no of channels, mxn], am I doing it right?
plz guide

its done!
just use