Only CrossEntropyLoss seems to work? MSE and other fail?

I run into a similar problem for every loss type I’m trying:

RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

Not sure how to fix it.

Based on the error message dim1 has a different shape for your model output and target.
Note that nn.CrossEntropyLoss assumes the output is passed as [batch_size, nb_classes, *], while the target should have the shape [batch_size, *] (missing nb_class dimension).
Other loss functions, such as nn.MSELoss expect the same shapes or are broadcasting internally, if possible.

Alright so I checked and the predicted label shape is torch.Size([32, 10])
while the ground_truth is torch.Size([32])

        pred_label = model(image)
        loss = criterion(pred_label, label)

32 is my batch size but I’m not sure where the 10 comes from.

The 10 should correspond to the number of classes, i.e. your model would output 10 logits for each sample in the batch of 32 samples.
These shapes should work (if you are really working with 10 classes).
However, the error message points to different shapes in dim1 (5 vs. 8), so there still seem to be a discrepancy between the error and the shapes you’ve posted.

So I guess I need to do an argmax on my batch because right now every image in the batch has a 1000 class predictions.

If I understand correctly a batch of 8 should have 8 class predictions or 1 per image.

Do you know how I can achieve this with tensors?

 for image_batch, label_batch, path in train_dl:

    pred_batch = model(
    #pred_batch.shape = torch.Size([8, 1000])
    #something here, to reshape to [8, 8]
    batch_loss = criterion(pred_batch,

Sorry if the numbers keep changing, Its different runs but ultimately same issue

It seems you might use a pretrained model, since your output has a shape of [batch_size, 1000] (the 1000 classes would then correspond to the ImageNet classes).
Based on your description, you are dealing with 8 classes, so I would recommend to change the last linear layer so that only 8 logits will be used.
Have a look at the transfer learning tutorial for more information.

Yes exactly!

However, I have 5 classes and my batch size is 8.
I previously tried what you suggested and here’s what I got:

RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

torch.Size([8, 5])

For a batch size of 8 and 5 classes your outputs should have the shape [8, 5] and the targets [8] containing values in [0, 4].

Yes, I agree. Here’s what I have;

  for image_batch, label_batch in train_dl:

    pred_batch = model(
torch.Size([8, 5])
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/ UserWarning: Using a target size (torch.Size([8])) that is different to the input size (torch.Size([8, 5])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)
RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

The provided shapes are for nn.CrossEntropyLoss and nn.MSELoss expects the tensors to have the same shape or broadcastable as explained in the first post.
If you want to use nn.MSELoss for a classification use case, you could probably create a one-hot encoded tensor via:

label_batch = F.one_hot(label_batch, num_classes=5)