Torchvision Semantic Segmentation


I am trying to use the scripts in to train a Semantic Segmentation model using a custom dataset of 6 classes (background-red, face-yellow, Eyes-blue, nose-green, hair-brown, lips-pink). I ned the background class as it does facial matting.

I have created a mask with 0 as default, 1-bg, 2-face,…, 6-lips. In the code, the cross entropy uses ignore_index=255 which I changed to ignore_index=0.

It should run fine, but I get error as : Target 255 is out of bounds with num_classes = 7

What am I doing wrong? I need help in understanding what the code base expects/how it works…


Based on the error message it seems your target tensor contains the class label 255, which is neither being ignored (via ignore_index) nor is a valid class index, as it’s out of bounds.
It seems you want to map the class index 255 to 0 and ignore it and if that’s the case, the mapping seems to fail since values of 255 are still found.

I found the issue. It was a faulty dataset.

My aim here is to have an understanding as to how the segmentation code works in the torchvision library. Now, there are 23 classes (0 being the unlabeled). So, I put ignore_index=0 and num_classess=23

The original images are 4000x6000. The network sees random crops of 480x480. The mask is a single channel of target class values (0-22). Obviously, some of the targets, because of cropping, do not have max value as 22.

Now, I get error : target 22 is out of bounds

What am I missing?

The output of the model for a segmentation use case (assuming you are using nn.CrossEntropyLoss) should have the shape [batch_size, nb_classes, height, width], even if not all classes are present in the current crop.
In your case it would thus be [batch_size, 23, height, width], where each channel would represent the logits for the corresponding class.
Based on the error message it seems your channel dimension in the output is smaller than 23.

Hi @ptrblck

Thanks a lot for all the help. Really appreciate it.