I am using 3D CNN for speech data, getting output shape error in model as Expected target size (16, 7), got torch.Size([16])

I have stack the 2 D mel spectrum of speech data using np.dstack() function. The model is

class Model(nn.Module):
    def __init__(self, num_classes=7):
        
        super(Model, self).__init__()

        self.CNN = nn.Sequential(nn.Conv3d(1,  128, (3, 3, 1), stride=(1, 1, 1)), nn.LeakyReLU(0.1),
            nn.MaxPool3d((2, 2, 1), stride=(2, 2, 1)),nn.Conv3d(128,  64, (3, 3, 1), stride=(1, 1, 1)), nn.LeakyReLU(0.1),
            nn.MaxPool3d((2, 2, 1), stride=(2, 2, 1)),nn.Conv3d(64,  64, (3, 3, 1), stride=(1, 1, 1)), nn.LeakyReLU(0.1),
            nn.MaxPool3d((2, 2, 1), stride=(2, 2, 1)))
            
        self.linear = nn.Sequential(
            nn.Flatten(start_dim=2,end_dim=4))
           
        self.lstm = nn.LSTM(input_size=64, hidden_size=60,num_layers=2,bidirectional=True,dropout=0.5,batch_first=True)
        self.encoder_layer = nn.TransformerEncoderLayer(d_model=120,dim_feedforward=512,nhead=8)
        self.emotion_layer = nn.Linear(120,num_classes)

        

    
    def forward(self, x):
      
        
        out=self.CNN(x)
        out=self.linear(out)
        out = out.permute(0, 2, 1)
        out, (final_hidden_state, final_cell_state) = self.lstm(out)
        out=self.encoder_layer(out)
        out=self.emotion_layer(out)
        return out

traing the model using following snippet

Losses, Accuracies = fp.fit_sm(model=model, optimizer=optimizer, epochs=epochs, scheduler=scheduler,
                                   trainloader=tr_loader, validloader=ts_loader,
                                   criterion=nn.CrossEntropyLoss(),
                                   device=device, verbose=True)

The Losses and Accuracies are returned by function fit_sm given below:

def fit_sm(model=None, optimizer=None, scheduler=None, epochs=None,
           trainloader=None, validloader=None,
           criterion=None, device=None,
           verbose=False):
    breaker()
    print("Training ...")
    breaker()

    # bestLoss = {"train" : np.inf, "valid" : np.inf}
    Losses = []
    Accuracies = []

    DLS = {"train": trainloader, "valid": validloader}

    start_time = time()
    for e in range(epochs):
        epochLoss = {"train": 0, "valid": 0}
        epochAccs = {"train": 0, "valid": 0}

        for phase in ["train", "valid"]:
            if phase == "train":
                model.train()
            else:
                model.eval()

            lossPerPass = []
            accuracy = []

            for X, y in DLS[phase]:
                X, y = X.to(device), y.to(device).view(-1)

                optimizer.zero_grad()
                
                with torch.set_grad_enabled(phase == "train"):
                    output= model(X)

                    loss = criterion(output.squeeze(), y)

                     if phase == "train":
                        loss.backward()
                        optimizer.step()
                lossPerPass.append(loss.item()/y.shape[0])
                accuracy.append(accuracy_score(torch.argmax(torch.exp(output.detach().cpu()), dim=1), y.cpu()))
                
            epochLoss[phase] = np.mean(np.array(lossPerPass))
            epochAccs[phase] = np.mean(np.array(accuracy))
            # Epoch Checkpoint // All or Best
        Losses.append(epochLoss)
        Accuracies.append(epochAccs)
        torch.save(model.state_dict(), "G:/Python_On_All_Dataset/emodb/SHORTCODE/transfer_learning/model_checkpoint_spec/Epoch_{}.pt".format(e+1))
        if scheduler:  # or use, if scheduler_1 or scheduler_2: // Use correct call method
              scheduler.step(epochLoss["valid"])
        # #     # scheduler.step()
         

        if verbose:
            print("Epoch : {} | Train Loss : {:.5f} | Valid Loss : {:.5f} \
| Train Accuracy : {:.5f} | Valid Accuracy : {:.5f}".format(e + 1, epochLoss["train"], epochLoss["valid"],
                                                            epochAccs["train"],
                                                            epochAccs["valid"]))

    breaker()
    print("Time Taken [{} Epochs] : {:.2f} minutes".format(epochs, (time() - start_time) / 60))
    breaker()
    print("Training Complete")
    breaker()

    return Losses, Accuracies

Getting error in line

loss = criterion(output.squeeze(), y)

As one solution from forum tried output.squeeze(), but still its giving same error , can any one spot the error. I know its some shape issue , but after so many debugging I cm here.
Regards

Could you post the shapes of output (without the squeeze) as well as y and explain the dimensions and use case a bit?

Sir I am not getting how to find the shapes of output and y ?
I have tried this

`with torch.set_grad_enabled(phase == "train"):
                    output= model(X)
                    loss = criterion(output, y)
                    print(output.shape)
                    print(y.shape)

But its not working :frowning:
the input shape is (300,40,3).
When I tried this the model is working though

I =torch.randn(30,1,300,40,3)
model=Model3D(7)
O=model(I)

pleas guide

Move the print statements before passing the tensors to the criterion. Otherwise you’ll hit the error before the prints are executed.

Ok sir got it @ptrblck
so the output shape its showing …torch.Size([30, 315, 7])
and y shape is …torch.Size([30])

sir 30 is my batch size and 7 is the number of classes here.

I assume you are working on a multi-class classification use case based on the target shape.
If that’s the case, then the model output should have the shape [batch_size=30, nb_classes=7].
I don’t know where the 315 is coming from, but you might need to “remove” it. E.g. if it’s a temporal dimension, you could either use the last time step by simply indexing it in dim1 or you could also reduce this dimension via e.g. torch.mean etc.