Multiclass Segmentation

Since it’s not a built-in loss function it would depend on your implementation.
If I remember it correctly the implementations I’ve seen were using a one-hot encoded target.

1 Like

What you mean the target would contain the class indices for each pixel location. I’m currently stack at this stage. I have it as [batch_size, num_classes, height, width]
Thanks!

Based on your shape it seems the target is oneot encoded, which is wrong in the case you are usingnn.CrossEntropyLoss.
Here is an example code snippet to show how the target mask should look like:

batch_size, nb_classes, h, w = 2, 10, 5, 5
target = torch.randint(0, nb_classes, (batch_size, h, w))
print(target)
> tensor([[[1, 0, 4, 9, 4],
           [2, 0, 8, 0, 4],
           [3, 8, 9, 8, 5],
           [6, 1, 1, 5, 1],
           [5, 3, 1, 5, 1]],

          [[6, 0, 2, 4, 4],
           [7, 3, 6, 4, 9],
           [4, 4, 2, 7, 2],
           [1, 5, 8, 9, 9],
           [0, 4, 5, 3, 7]]])
2 Likes

All right, that is much more clearer, thanks.
I’m still not sure how to get from there to here, though
but I guess it is a lack of understanding on my part, and I’ll try to get more familiar with the subject.
For now, I’m training the model with a Diceloss now and the training is running smoothly at the moment.
Thanks a lot! Your answers are always a great resource of knowledge for me :))

A multi-class dice loss can be implemented using one-hot encoded targets and you would need the “class index” target for e.g. nn.CrossEntropyLoss. To create it you can use target = torch.argmax(one_hot_target, dim=1) on the one-hot encoded target tensor.

2 Likes

Thanks a lot, but now I’m getting this error:
RuntimeError: The size of tensor a (8) must match the size of tensor b (6) at non-singleton dimension 1

This error might be raised due to a wrong reshaping. Could you post the shape of the model output and target as well as how you are calculating the loss?

1 Like

Thanks a lot for replying!
I’m still trying to resolve this.
So this is the last Conv layer for the model: Conv2d(32, 6, kernel_size=(1, 1), stride=(1, 1))
and, this is the target shape: [8, 224, 224]
I didn’t apply any activation function on the output and applied CrossEntropyLoss.
Now I changed the batch size to 32 and end up with the same error but this time:
RuntimeError: The size of tensor a (32) must match the size of tensor b (6) at non-singleton dimension 1 Not sure what is the cause.

Based on the setup of the last conv layer, I assume you are dealing with 6 classes for your multi-class segmentation.
If that’s the case, nn.CrossEntropyLoss expects the model output to have the shape [batch_size, 6, height, width], while the target should have the shape [batch_size, height, width] and contain values in the range [0, 6].

2 Likes

Yes, I’m sure about the first part and that is exactly the shape of my model output, but not the second one.
I have implemented CrossEntropy for text labels, but never images. So this is for me hard to visualize. I mean the [0,6] values within the [batch_size, height, width] and I couldn’t find examples for this case.

I’ll try to go back and read more about CrossEntropyLoss basics then come back with a deeper understanding
Thank you so much!

A simple example would be:

batch_size = 2
nb_classes = 6
height, width = 10, 10
target = torch.randint(0, nb_classes, (batch_size, height, width))
print(target)

Maybe it’ll help visualizing it.

2 Likes

Awesome!
Now I feel dumb for not getting it the first time :sweat_smile:
Thank you si much. I’ll check the values of my mask then, because this is the only part I didn’t.
Thanks again I’ll let you know if it works!

Hey can i know how did you use the nn.CrossEntropyLoss() ?

Is it possible to use nn.BCEWithLogitsLoss in a multi-class segmentation case?
for example, the output tensor with the shape [batch_size, nb_classes, height ,width] and one-hot the label to make it from [batch_size, height ,width] to [batch_size, nb_classes, height ,width]? so that the label and the output have the same shape.
thank you

You could use nn.BCEWithLogitsLoss for the mentioned output and target shapes, but it would be considered a multi-label segmentation (where each pixel could belong to zero, one, or multiple classes), not a multi-class segmentation (where each pixel would belong to a single class only).

thank you.
let me summarize to see if my understanding is correct.

  • If I want to do a multi-label segmentation, I should use nn.BCEWithLogitsLoss. I have to one-hot the target and make its shape to be [batch_size, nb_classes, height ,width]so it matches the output tensor’s shape.
  • If I want to do a multi-class segmentation, I should use nn.CrossEntropyLoss. In this way, I don’t need to do any reshape of my target.

feel free to point out my mistakes.

Your understanding is correct. A small correction: when using nn.BCEWithLogitsLoss, your target can contain floating point numbers in [0, 1], so doesn’t have to be one-hot encoded (you could use it as “soft targets”).

1 Like

If I don’t one-hot encode my target. what can I do to make the target’s shape be [batch_size, nb_classes, height ,width]? because my original target is of size [batch_size, height ,width].

Note that the small correction didn’t claim that a one-hot encoded target is wrong, just that it’s not strictly necessary, since floating point targets in [0, 1] are also valid.
In any way, if you want to transform class indices to a one-hot encoded target, you could use F.one_hot(target, num_classes).

I understand the one-hot is not necessary. When I used the nn.BCEWithLogitsLoss, it raised a dimension mismatch error (the model output tensor [batch_size, nb_classes, height ,width] does’t match the target tensor [batch_size, height ,width]). so I’m wandering what I can do to fix it without one-hot encoding.