Loss function for multi class semantic segmentation


  1. I am training a semantic segmentation model, specifically the deeplabv3 model from torchvision.
  2. I am training this model on the CIHP dataset, a dataset consisting of human images and 20 class labels for different body parts (arm, leg, face etc…)
  3. I am lost as to how to compute the loss for the following tensors:
input.shape = (batch_size, 3, 512, 512)
mask.shape = (batch_size, 1, 512, 512)

the output of the model:

model = models.segmentation.deeplabv3_resnet101(pretrained=False)
model.classifier = DeepLabHead(2048, 20)

is a tensor of shape (batch_size, 20, 512, 512), where 20 are the different possible classes.

When I try to calculate the loss as follows:
loss = criterion(outputs['out'], masks)

I get the following warning:

Using a target size (torch.Size([10, 1, 512, 512])) that is different to the input size (torch.Size([10, 20, 512, 512])). This will likely lead to incorrect results due to broadcasting.

The warning makes sense to me, but then how would one calculate the loss for this type of model?


I assume you are working on a multi-class segmentation use case and are thus using nn.CrossEntropyLoss as the criterion. The error message doesn’t seem to fit this loss function, so you might want to switch to it. Once this is done, remove the “channel dimension” in your masks tensors, as they are not expected. nn.CrossEntropyLoss expects a model output tensors in the shape [batch_size, nb_classes, height, width] containing the logits for each pixel and a target tensor in the shape [batch_size, height, width] containing the class indices for each pixel in the range [0, nb_classes-1].

Thank you very much!! Very helpful :slight_smile: