- I am training a semantic segmentation model, specifically the deeplabv3 model from torchvision.
- I am training this model on the CIHP dataset, a dataset consisting of human images and 20 class labels for different body parts (arm, leg, face etc…)
- I am lost as to how to compute the loss for the following tensors:
input.shape = (batch_size, 3, 512, 512) mask.shape = (batch_size, 1, 512, 512)
the output of the model:
model = models.segmentation.deeplabv3_resnet101(pretrained=False) model.classifier = DeepLabHead(2048, 20)
is a tensor of shape
(batch_size, 20, 512, 512), where 20 are the different possible classes.
When I try to calculate the loss as follows:
loss = criterion(outputs['out'], masks)
I get the following warning:
Using a target size (torch.Size([10, 1, 512, 512])) that is different to the input size (torch.Size([10, 20, 512, 512])). This will likely lead to incorrect results due to broadcasting.
The warning makes sense to me, but then how would one calculate the loss for this type of model?