Loss function for segmentation

hello everyone. I am trying to create a segmentation network (Unet) but I get confused a little.
my network output dimension is y_pread >>(3,960,960) and also I have labels for my outputs with the same dimension y_real>>(3,960,960). I know that I can get one channel instead of three-channel and change the RGB image into a grayscale it means y_pread >>(1,960,960) and y_real >>(1,960,960)
but the problem is a crossentropy loss or the other loss such as BCEloss or the other could not support image or 2 dimensions in a loss.
what should I do for my Unet loss function???

I am not an expert, but just today I was reading a blog post that mentions something about that.
They use MSELoss.

Here is the blog post:
https://towardsdatascience.com/colorizing-black-white-images-with-u-net-and-conditional-gan-a-tutorial-81b2df111cd8

I hope it helps.

Hi Amin!

Most typically for semantic segmentation you will classify each pixel in
your image into one of nClass classes (including what’s usually called
a “background” class).

The input to your model will have shape [nBatch, nChannel, H, W]
(where the batch size, nBatch, is not built into the model, and the
model is (usually) agnostic to it). For an RGB image, nChannel would
be 3.

The final layer of your U-Net model would typically be a Conv2d with
out_channels = nClass and the output of the model would have
shape [nBatch, nClass, H, W].

In this situation you would typically use (or at least start with)
CrossEntropyLoss as the loss function, and your ground-truth
labels, the target passed to CrossEntropyLoss, will have shape
[nBatch, H, W] (no nClass dimension), and consist of integer
class labels that run from 0 to nClass - 1.

Best.

K. Frank

thanks a million Mr.Hurtado

Hi KFrank
thanks for your quick response. if I understand correctly, I create y-label with 3 classes and also the output of my network.
the batch size here is 1 and H = 4 and W = 3 in one channel, like the code below:

y_label = [[[1, 0, 2],
            [1, 1, 0],
            [2, 0, 0],
            [1, 0, 0]]]
y_label = torch.FloatTensor(y_label)
# print(y_label.shape) >> torch.Size([1, 4, 3])

y_target = [[[0.950, 0.210, 0.890],
             [0.875, 0.254, 0.248],
             [0.875, 0.012, 0.148],
             [0.542, 0.143, 0.002]]]
y_target = torch.FloatTensor(y_target)
# print(y_target.shape) >> torch.Size([1, 4, 3])

loss_func = nn.CrossEntropyLoss()
loss = loss_func(y_label, y_target)

but I get an error
ValueError: Expected target size (1, 3), got torch.Size([1, 4, 3])

Hi Amin!

No, this is not quite right.

Yes, this is almost correct (assuming that the y_label are your
ground-truth labels, provided externally, rather than predicted
by your network). The shape and values are fine, but this needs
to be a LongTensor (rather than a FloatTensor). Thus:

y_label_list = [[[1, 0, 2],
            [1, 1, 0],
            [2, 0, 0],
            [1, 0, 0]]]
y_label = torch.LongTensor (y_label_list)
# or
y_label = torch.tensor (y_label_list, dtype = torch.int64)
# or just
y_label = torch.tensor (y_label_list)

(This will be the second argument passed to a CrossEntropyLoss
object and is referred to in the CrossEntropyLoss documentation
as target.)

If this is meant to be the prediction made by your network (your
network’s output), then it isn’t right.

(As a matter of nomenclature, I would not call the output of your
network “target” as “target” is normally used to denote the
ground-truth labels. I like to call it “prediction.” This becomes the
first argument to your CrossEntropyLoss object and is referred
to in the documentation as input. Thus the output of your network
is the “prediction” and is the input to CrossEntropyLoss.)

Let me use y_pred as the name for what you call y_target.

Your y_pred is missing its nClass dimension. That is, it should
have shape [nBatch, nClass, H, W], so, using the values in your
example, [1, 3, 4, 3].

For each “pixel” in your predicted “image” you have three (nClass)
logits that are the raw scores for that pixel being in each of your
three classes. Thus y_pred[0, :, 2, 1] is the 1d vector of three
logits that predicts to which of your three classes your “2-1” pixel
belongs.

Based on what I think your variables are supposed to be, you have
your two arguments to loss_func passed in the wrong order.

Using my preferred y_pred) (instead of your y_target), this should
be:

loss_func = nn.CrossEntropyLoss()
loss = loss_func (y_pred, y_label)

Thus:

>>> import torch
>>> torch.__version__
'1.9.0'
>>> _ = torch.manual_seed (2021)
>>> y_label = [[[1, 0, 2],
...             [1, 1, 0],
...             [2, 0, 0],
...             [1, 0, 0]]]
>>> y_label = torch.tensor (y_label)
>>> y_pred = torch.randn (1, 3, 4, 3)
>>> loss_func = torch.nn.CrossEntropyLoss()
>>> loss = loss_func (y_pred, y_label)
>>> loss
tensor(1.5173)

Best.

K. Frank

thank a million KFrank,
it works for me,
send a million kisses :kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::kissing_heart::heart_eyes::heart_eyes::heart_eyes::heart_eyes: