Understanding the error I got while Training my model for semantic segmentation

I have a difficulty understanding the error I got while training. The error goes like this:

 UserWarning: Using a target size (torch.Size([10, 1, 224, 224])) that is different to the input size (torch.Size([10, 60, 224, 224])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.
  return F.mse_loss(input, target, reduction=self.reduction)

My target is the mask for every image and its shape is [batch size, channel, dimension] and my input is the OUTPUT OF MY NETWORK and its shape is [batch size, number of classes, dimension]. Can anyone help me to understand this?

the number of channels of the input and of the target is inconsistent. You should change the loss to CrossEntropyLoss

1 Like

Thank you so much for answering. I tried changing my loss function to cross entropy but I got an error:

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

What does this mean?

In the CrossEntropyLoss, the target must be in the shape NxHxW where each element is an integer in range [0, C-1], and target must have the shape NxCxHxW

Can I clarify something? The N here represents number of channel or batch size?

It’s the batch-size. In most cases of Pytorch, except for RNNs as far as I know, the first number always the batch-size.

You repeated mentioning target. Can I clarify if the target is the NxHxW and the input is the NxCxHxW?

Yes, I think you should read the documents again to understand the function better.

Thank you! Can I ask if what values should I expect from my output? my training script goes like this:



def train(model):


  for epoch in range(EPOCHS):

      for i in tqdm(range(0, len(img_all), BATCH_SIZE)): 

          batch_img_all = img_all[i:i+BATCH_SIZE].view(-1, 3, 224, 224)

          batch_mask_all = mask_all[i:i+BATCH_SIZE].view(-1, 1, 224, 224)



          outputs = model(batch_img_all)

          batch_mask_all = torch.argmax(batch_mask_all, dim=1)

          loss = loss_function(outputs, batch_mask_all)


          optimizer.step()    # Does the update

      print(f"Epoch: {epoch}. Loss: {loss}")


  return batch_img_all, batch_mask_all, outputs

IMG, MSK, OUTPUT = train(model)

The outputs depend on the activation function of the output layer in your model. You must be careful when placing this activation function because each type of loss function has its own range of values.

In your code snip, if you use Cross-Entropy Loss, you should not use any activation function in the output layer. However, you took argmax in the channel dimension of your ground-truth which has only 1 channel, this step does not make sense.

Thank you for correcting me. But is there any way I could solve the error

RuntimeError: only batches of spatial targets supported (3D tensors) but got targets of dimension: 4

this is the error I got before using the code argmax

Just convert the batch_mask_all into a 3D tensor by squeeze the dimension 1

Thank you again! I would try this one.

Hey! I am really grateful for your patience in answering my questions. I am really new to this field and would really want to successfuly make my current project work. Anyway, I tried squeezing dimension 1 but I got this error. DO you know what this means? some forums said I should convert my target data type but I am not confident doing it.

RuntimeError: expected scalar type Long but found Float

Yes, the target for CrossEntropy must be in Long while the input must be in Float

Thank you so much for your help! I would try to edit my code. Hopefully it will work this time.

HI! I am back again and I tried converting my input and target data types to Long and Float respectively. But I got an error again:

IndexError                                Traceback (most recent call last)
<ipython-input-15-0148a2dd2dca> in <module>()
     21   return batch_img_all, batch_mask_all, outputs
---> 23 IMG, MSK, OUTPUT = train(model)

4 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
   2264         ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2265     elif dim == 4:
-> 2266         ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
   2267     else:
   2268         # dim == 3 or dim > 4

IndexError: Target 60 is out of bounds.

I suspect the value 60 has something to do with my class indices since I have 60 classes. But I cannot comprehend why it has this error when I already declared my number of classes. Can you help me again? Thank you so much.

Due to the fact that if you have N foreground classes (1, 2, 3, …, N), your model must produce a segmentation map of N+1 channels in which the first channel is for background class (0).

The solution I did was declare a total of 61 classes. Does my solution makes sense? And by the way I tried training just 1 epoch just to see if my solution will make my code work and it gave me this result:

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 150/150 [16:06<00:00,  6.45s/it]Epoch: 0. Loss: 4.002954483032227

is it logical to have this kind of value for loss? what should be the loss value range?