Target 1 is out of bounds

I have a problem that I don’t understand. It concerns the CrossEntropyLoss function. I am currently working on segmentation of a certain type of pattern in medical images. For this, I have built an architecture that returns after a softmax an array of output images of the form: [N, C, H, W] where N is the size of my batch, C the number of channels in each image and H, W the height and width of my image respectively. However, when I enter something of the form [10, 1, 240, 240] for my outputs and [10, 240, 240] for my targets, it always returns:

IndexError: Target 1 is out of bounds.

I’ve seen many things on the forums. Each time it mentions “classes” but I don’t understand what that means. It seems to correspond to my number of channels for my output images (in my case 1 because they are 2D binary images) however it doesn’t work. I think I’m missing something.

Could someone please enlighten me?

I am attaching the few lines in question.

        # inputs => [10, 3, 240, 240]
        outputs = segnetModel(images)

        # outputs => [10, 1, 240, 240]
        # targets => [10, 240, 240]
        loss = cost(outputs, labels)

I also found a question on the forum that is very close to what I wanted but still talks about “classes” : The cost function for semantic segmentation?

Thank you very much for your help in advance.

As you’ve already pointed out the issue is created by the missing “class dimension” in the output.
In a multi-class segmentation use case nn.CrossEntropyLoss expects the model output to have the shape [batch_size, nb_classes, height, width], where each “channel” represents an activation maps, which gives the logits (unnormalized probabilities) for each pixel location.
E.g. if you are working on a binary segmentation you could treat it as a 2-class multi-class segmentation using an output in the shape [batch_size, 2, height, width], where the first channel would correspond to the logits of each pixel for class0 and the second channel for class1, respectively.

In your current code you are using a single output channel, so that your model would only be able to predict class0, which makes it a bit useless.

In any case, since you are working on a binary segmentation, you could also keep the single output channel and use nn.BCEWithLogitsLoss alternatively, which would then map a “low” value to class0 and a “high” one to class1. In that case your target should have the same shape as the model output.

1 Like

Thank you for your answer!
I finally used nn.BCEWithLogitsLoss and I could compile my CNN (it doesn’t learn again but it compiles at least ^^)!