Since it’s not a built-in loss function it would depend on your implementation.
If I remember it correctly the implementations I’ve seen were using a one-hot encoded target.
What you mean the target would contain the class indices for each pixel location. I’m currently stack at this stage. I have it as [batch_size, num_classes, height, width]
Thanks!
Based on your shape it seems the target is oneot encoded, which is wrong in the case you are usingnn.CrossEntropyLoss
.
Here is an example code snippet to show how the target mask should look like:
batch_size, nb_classes, h, w = 2, 10, 5, 5
target = torch.randint(0, nb_classes, (batch_size, h, w))
print(target)
> tensor([[[1, 0, 4, 9, 4],
[2, 0, 8, 0, 4],
[3, 8, 9, 8, 5],
[6, 1, 1, 5, 1],
[5, 3, 1, 5, 1]],
[[6, 0, 2, 4, 4],
[7, 3, 6, 4, 9],
[4, 4, 2, 7, 2],
[1, 5, 8, 9, 9],
[0, 4, 5, 3, 7]]])
All right, that is much more clearer, thanks.
I’m still not sure how to get from there to here, though
but I guess it is a lack of understanding on my part, and I’ll try to get more familiar with the subject.
For now, I’m training the model with a Diceloss
now and the training is running smoothly at the moment.
Thanks a lot! Your answers are always a great resource of knowledge for me :))
A multi-class dice loss can be implemented using one-hot encoded targets and you would need the “class index” target for e.g. nn.CrossEntropyLoss
. To create it you can use target = torch.argmax(one_hot_target, dim=1)
on the one-hot encoded target tensor.
Thanks a lot, but now I’m getting this error:
RuntimeError: The size of tensor a (8) must match the size of tensor b (6) at non-singleton dimension 1
This error might be raised due to a wrong reshaping. Could you post the shape of the model output and target as well as how you are calculating the loss?
Thanks a lot for replying!
I’m still trying to resolve this.
So this is the last Conv layer for the model: Conv2d(32, 6, kernel_size=(1, 1), stride=(1, 1))
and, this is the target shape: [8, 224, 224]
I didn’t apply any activation function on the output and applied CrossEntropyLoss.
Now I changed the batch size to 32 and end up with the same error but this time:
RuntimeError: The size of tensor a (32) must match the size of tensor b (6) at non-singleton dimension 1
Not sure what is the cause.
Based on the setup of the last conv layer, I assume you are dealing with 6 classes for your multi-class segmentation.
If that’s the case, nn.CrossEntropyLoss
expects the model output to have the shape [batch_size, 6, height, width]
, while the target should have the shape [batch_size, height, width]
and contain values in the range [0, 6]
.
Yes, I’m sure about the first part and that is exactly the shape of my model output, but not the second one.
I have implemented CrossEntropy for text labels, but never images. So this is for me hard to visualize. I mean the [0,6] values within the [batch_size, height, width]
and I couldn’t find examples for this case.
I’ll try to go back and read more about CrossEntropyLoss basics then come back with a deeper understanding
Thank you so much!
A simple example would be:
batch_size = 2
nb_classes = 6
height, width = 10, 10
target = torch.randint(0, nb_classes, (batch_size, height, width))
print(target)
Maybe it’ll help visualizing it.
Awesome!
Now I feel dumb for not getting it the first time
Thank you si much. I’ll check the values of my mask then, because this is the only part I didn’t.
Thanks again I’ll let you know if it works!
Hey can i know how did you use the nn.CrossEntropyLoss() ?
Is it possible to use nn.BCEWithLogitsLoss
in a multi-class segmentation case?
for example, the output tensor with the shape [batch_size, nb_classes, height ,width]
and one-hot the label to make it from [batch_size, height ,width]
to [batch_size, nb_classes, height ,width]
? so that the label and the output have the same shape.
thank you
You could use nn.BCEWithLogitsLoss
for the mentioned output and target shapes, but it would be considered a multi-label segmentation (where each pixel could belong to zero, one, or multiple classes), not a multi-class segmentation (where each pixel would belong to a single class only).
thank you.
let me summarize to see if my understanding is correct.
- If I want to do a multi-label segmentation, I should use
nn.BCEWithLogitsLoss
. I have to one-hot the target and make its shape to be[batch_size, nb_classes, height ,width]
so it matches the output tensor’s shape. - If I want to do a multi-class segmentation, I should use
nn.CrossEntropyLoss
. In this way, I don’t need to do any reshape of my target.
feel free to point out my mistakes.
Your understanding is correct. A small correction: when using nn.BCEWithLogitsLoss
, your target can contain floating point numbers in [0, 1]
, so doesn’t have to be one-hot encoded (you could use it as “soft targets”).
If I don’t one-hot encode my target. what can I do to make the target’s shape be [batch_size, nb_classes, height ,width]
? because my original target is of size [batch_size, height ,width]
.
Note that the small correction didn’t claim that a one-hot encoded target is wrong, just that it’s not strictly necessary, since floating point targets in [0, 1]
are also valid.
In any way, if you want to transform class indices to a one-hot encoded target, you could use F.one_hot(target, num_classes)
.
I understand the one-hot is not necessary. When I used the nn.BCEWithLogitsLoss
, it raised a dimension mismatch error (the model output tensor [batch_size, nb_classes, height ,width]
does’t match the target tensor [batch_size, height ,width]
). so I’m wandering what I can do to fix it without one-hot encoding.