Only batches of spatial targets supported (non-empty 3D tensors) but got targets of size: : [1, 1, 256, 256]

In your LogNLLLoss class you are applying torch.log on mode output, while cross_entropy expects raw logits, so you might want to remove it.
Assuming the model outputs logits, I think you might also need to use F.softmax for the dice loss calculation, as this criterion uses the probabilities, if I’m not mistaken.

@ptrblck Sorry for the confusion but I am now using BCEWithLogitsLoss since I found it is better. I don’t run my output layer through an activation since BCEWithLogitsLoss applies a sigmoid. Do you think in the dice loss I should use pred = F.softmax(pred) in the first line of the forward pass in the SoftDiceLoss class? And what dimension should it be done on?

hi @ptrblck, sorry to add onto this thread but I’m an undergrad working on CV for the first time for my senior thesis and also ran into the original error in trying to run DeepLab. My images are in RGB format and I’m trying to do multiclass segmentation so I have 3 classes and my masks are pixels 0, 1, 2 (1 channel).

But even after reading in my images and doing target.squeeze(1) I still get this error:

  File "/n/home07/michelewang/.conda/envs/active/lib/python3.8/site-packages/torch/nn/functional.py", line 2266, in nll_loss
    ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1only batches of spatial targets supported (3D tensors) but got targets of size: : [4, 513, 513, 3]

my image shape is [4, 3, 513, 513] and my output shape is [4, 513, 513, 3]. do you have any idea for how to resolve this?

Based on the error message it seems that the target shape is [4, 513, 513, 3] not the model output shape.
In any case: the model is supposed to output a tensor in the shape [batch_size, nb_classes, height, width] while the target should have the shape [batch_size, height, width] and contain the class indices in the range [0, nb_classes-1].
If your target has 3 channels, I guess that it might be a color image. In that case you would need to map the colors to the class indices first.

1 Like

Hi @ptrblck, thank you so much!! That was it – I should’ve reduced the colors to be a single RGB channel. One strange thing that happened was that my segmentation masks only have 3 values (0, 1, and 2 for each of my 3 classes) but when I read them in, the value 255 was also part of the masks. Have you ever encountered that before?

Your segmentation masks are supposed to have values in [0, 1, 2], if you are trying to predict 3 classes in the segmentation output, so this sounds correct.
The 255 values are most likely coming from the image format you are loading.
Are your segmentation masks currently color-encoded? I.e. is e.g. the color “red” referring to class0, “blue” to class1 etc.?
If so, then note that these color encoded masks will have 3 channels (RGB) and will use the standard uint8 value range.
Red would thus be [255, 0, 0], Blue [0, 0, 255] etc. and you would need to map these colors to the class labels first. This post gives an example how this mapping could be applied.

1 Like

Ah okay, this is so helpful. Thank you so much @ptrblck!!