Question about output and label channels in semantic segmentation

Yes, I am trying to debug it right now. I have done this before with TF, but want to move to Pytorch and hence the effort

1 Like

Ok, let me know, if you get stuck somewhere.

Ok, I feel batch is causing the problem, my batch size is 3 and hence getting [3, 512, 512] as my dimension

In that case everything seems to work!
Sorry for missing this point, as I thought this shape refers to a single mask.

Do you get any errors, since your code looks alright then.

I do not get any errors, if you feel this is fine then important question is how do I interpret the output which is in the shape
[batch_size, nb_classes, height, width] which in my case for 1 image is [1, 12, 512, 512]. And convert this back to image with segmentation
predictions[‘out’]
Out[76]:
tensor([[[[-0.0465, -0.0465, -0.0465, …, -0.0556, -0.0556, -0.0556],
[-0.0465, -0.0465, -0.0465, …, -0.0556, -0.0556, -0.0556],
[-0.0465, -0.0465, -0.0465, …, -0.0556, -0.0556, -0.0556],
…,

You see the logits in each channel corresponding to each class, i.e. channel0 gives the logits for class0, etc.
If you would like to get the predictions (as class indices), you could use:

preds = torch.argmax(predictions['out']), 1)

and could then visualize the predictions similar to your target.

Brilliant! Thanks a lot, let me try this out.

Does the value 2-9 represent the 12 classes in the labels?

Also, why is the label having 3 channels?