Only CrossEntropyLoss seems to work? MSE and other fail?

alx · April 15, 2020, 10:20pm

I run into a similar problem for every loss type I’m trying:

RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

Not sure how to fix it.

ptrblck · April 16, 2020, 5:24am

Based on the error message dim1 has a different shape for your model output and target.
Note that nn.CrossEntropyLoss assumes the output is passed as [batch_size, nb_classes, *], while the target should have the shape [batch_size, *] (missing nb_class dimension).
Other loss functions, such as nn.MSELoss expect the same shapes or are broadcasting internally, if possible.

alx · April 16, 2020, 8:07pm

Alright so I checked and the predicted label shape is torch.Size([32, 10])
while the ground_truth is torch.Size([32])

        pred_label = model(image)
        loss = criterion(pred_label, label)

32 is my batch size but I’m not sure where the 10 comes from.

ptrblck · April 17, 2020, 3:29am

The 10 should correspond to the number of classes, i.e. your model would output 10 logits for each sample in the batch of 32 samples.
These shapes should work (if you are really working with 10 classes).
However, the error message points to different shapes in dim1 (5 vs. 8), so there still seem to be a discrepancy between the error and the shapes you’ve posted.

alx · April 18, 2020, 12:34am

So I guess I need to do an argmax on my batch because right now every image in the batch has a 1000 class predictions.

If I understand correctly a batch of 8 should have 8 class predictions or 1 per image.

Do you know how I can achieve this with tensors?

 for image_batch, label_batch, path in train_dl:

    pred_batch = model(image_batch.to(device))
    #pred_batch.shape = torch.Size([8, 1000])
    #something here, to reshape to [8, 8]
    batch_loss = criterion(pred_batch, label_batch.to(device))

Sorry if the numbers keep changing, Its different runs but ultimately same issue

ptrblck · April 18, 2020, 1:43am

It seems you might use a pretrained model, since your output has a shape of [batch_size, 1000] (the 1000 classes would then correspond to the ImageNet classes).
Based on your description, you are dealing with 8 classes, so I would recommend to change the last linear layer so that only 8 logits will be used.
Have a look at the transfer learning tutorial for more information.

alx · April 18, 2020, 1:53am

Yes exactly!

However, I have 5 classes and my batch size is 8.
I previously tried what you suggested and here’s what I got:

RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

torch.Size([8, 5])

ptrblck · April 18, 2020, 1:55am

For a batch size of 8 and 5 classes your outputs should have the shape [8, 5] and the targets [8] containing values in [0, 4].

alx · April 18, 2020, 1:59am

Yes, I agree. Here’s what I have;

  for image_batch, label_batch in train_dl:
    optimizer.zero_grad()

    pred_batch = model(image_batch.to(device))
    print(pred_batch.shape)
    print(label_batch.shape)

torch.Size([8, 5])
torch.Size([8])
/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py:431: UserWarning: Using a target size (torch.Size([8])) that is different to the input size (torch.Size([8, 5])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

RuntimeError: The size of tensor a (5) must match the size of tensor b (8) at non-singleton dimension 1

ptrblck · April 18, 2020, 2:02am

The provided shapes are for nn.CrossEntropyLoss and nn.MSELoss expects the tensors to have the same shape or broadcastable as explained in the first post.
If you want to use nn.MSELoss for a classification use case, you could probably create a one-hot encoded tensor via:

label_batch = F.one_hot(label_batch, num_classes=5)