Hi, I am implementing a UNet for semantic segmentation and i have my data set of images and label images (three classes). I have confused my self about the label images. Should the label images be a tensor of the class index (like 1 ,2 ,3) or its raw pixel value.
I used the raw images (loaded the images and labels and converted to tensors) and fed it to the Unet with a cross entropy loss. It first said the labels needed to be long tensors (so i converted it). Now i am getting this error.
Any guidance on what i am missing or doing wrong is much appreciated.
Thanks in advance
Hello,
I think that your targets should be something like B x 1 x H x W.
where B is your batch size, H and W height and width of your image.
The target should be a long tensor with the class value for every pixel (0 if it is class 1 at pixel, 1 if classe 2 etc…)
Hi. Thank you for your quick response.
This makes sense. So i have my labels, which are images , black, green and red (three classes) but they are raw pixels at the moment. this is why i get B X 3 X H X W.
So i need to make it B X 1 X H X W. So i should iterate through all my label images and convert the three channels to the class index (ex black 0, green 1 and red 2). and create masks.
Am i understanding it right. ?
And if so is there a way to convert the raw pixels to masks (contains the class index) ?
Exactly. You have your three classes let’s say black green and red. You should transform your B x 3 x H x W in a way to get a B x 1 x H x W. At the end the output should look something like:
[B, B, R, B] [0, 0, 2, 0]
[B, B, R, B] [0, 0, 2, 0]
[B, G, R, R] [0, 1, 2, 2]
(For example, let’s say the image is a Red L surrounded by background Black and a single Green pixel).
I imagine your 3 channels look like(for my example and the three classes black, green, red repectively):
[1, 1, 0, 1] [0, 0, 0, 0][0, 0, 1, 0]
[1, 1, 0, 1] [0, 0, 0, 0][0, 0, 1, 0]
[1, 1, 0, 0] [0, 1, 0, 0][0, 0, 1, 1]
If you have something like that, you can easily select all the positions containing 1 for the different masks and set them three different values, and then add the three masks together to fuse them.
I am not sure to be very clear.
Something like:
def create_mask(black, green, red):
black[black == 1] = 1
green[green == 1] = 2
red[red == 1] = 3
mask = black + green + red
return mask
Note that if I remember well, the CrossEntropyLoss expected classes that starts with index 0 so be carefull on which value you give to it.
Again thank you for the quick responses and the help i really appreciate it.
My labels look like this
[[[0.00392157 0.06274509 0. ]
[0.15686274 0.01176471 0.03921568]
[0.15686274 0. 0.01176471]
…
[0.7411765 0.00392157 0.00784314]
[0.7529412 0. 0.01568621]
[0.74509805 0. 0.02352941]]
[[0.00392157 0. 0.03529412]
[0.15686274 0. 0.09411764]
[0.16470587 0. 0.0862745 ]
…
[0.7411765 0.00392157 0.00784314]
[0.7529412 0. 0.01568621]
[0.74509805 0. 0.02352941]]
[[0.03921568 0. 0.08235294]
[0.14509803 0. 0.11764705]
[0.14509803 0. 0.0862745 ]
…
[0.74509805 0. 0.00784314]
[0.7529412 0. 0.01568621]
[0.74509805 0. 0.02352941]]
…
[[0.0078432 0.00392157 0.06666666]
[0. 0.00784314 0.05882347]
[0. 0.00784314 0.05882347]
…
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]]
[[0.04313725 0. 0.05882347]
[0.03921568 0. 0.05882347]
[0.03921568 0. 0.05490196]
…
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]]
[[0.05490196 0. 0.01176471]
[0.05098039 0. 0.01176471]
[0.05098039 0. 0.00392157]
…
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]
[0.74509805 0. 0.00784314]]]
I think i need some sort of color mapping where i need to find the red pixels and name it class 2, green pixels to class 1 and black pixels to class 0.
I am unsure on how to read the raw pixel values and create a function for color mapping any guidance on that will be really helpful
No problem.
So If i understand right, your target is just an RGB image with pixels values between 0 and 1 containing only 3 colors?
Yes @SoucheChapich My labels are images values between 0 and 1 containing only 3 colors
Ok, I got it.
I am not an expert but my first idea would be to convert it back in the [0, 255] range and set three thresholds. Then for each pixel, look in which range it belongs and atribute it the 3 classes values. For example in RGB black value is (0,0,0) so you could set a threshold for each pixels closed to that values, the final/target value will be 0 for class 0 (black)
Thank you very much for your time, ill give this a try