Semantic segmentation loss function / shape of prediction and target


for a research project I’m currently implementing a RNN to solve a semantic segmentation task.

The target segmentation maps have the shape (c, h, w) which is (11, 64, 84) in my example, so there are 64*84 (number of pixels) vectors of length 11 with a 1 in the position of the class.
My network also predicts a tensor of the same shape, where the vectors for each pixel are the output of a nn.LogSoftmax layer.

If I now use nn.NLLLoss and try to calculate

loss = criterion(output, target)

I get

Expected target size (11, 84), got torch.Size([11, 64, 84])

So I read that NLLLoss expects labels, but I’m sure there is some way of using what I have without first applying another function, as it seems to contain all necessary information.

What did I understand wrongly? Is there a better way to solves this?

It seems your target is currently one-hot encoded.
If you are using nn.NLLLoss, the target is expected to contain the class indices instead (without the channel dimension).
In case your current target shape is [batch_size, c, h, w], try to convert it using:

target = torch.argmax(target, 1)
loss = criterion(output, target)
1 Like