How do i get the ultimate output tensor in a semantic segmention task

I’m practicing a semantic segmentation task.
As there are two class for each pixel, the output of the model is of shape(1, 2, 512, 512). See there are two channels, but i need to convert it into shape(1, 1, 512, 512), to match the shape of my label. Wondering what method should i take to complete this transformation?


I think shape of output of the network is depending on which loss function employed.
In binary classification, you can use both CELoss and BCELoss.
For CELoss, the shape of the output should be [batch_size, nb_classes, H, W] then produce a probability map by softmax.
For BCELoss, the shape should be [batch_size, H, W] which is the same to label, and it should incorperate with sigmoid.

Note: CELoss has contain logsoftmax, so you could only pass the model output and label.
BCELossWithLogits = BCELoss+sigmoid
for more details

How to get the ultimate output tensor ,i.e. (1, 2, 512, 512) to transform into the ground truth shape (1, 1, 512, 512)? I know how to use CELoss and BCELoss

Have you solved the problem?


This line worked for me.

_, pred = torch.max(scores, dim=1)

Where scores is your ultimate tensor containing probs.

For instance, scores tensor has size of [10, 150, 256, 256] which means I have 150 classes to segment and using above code give me [10, 256, 256] tensor. Then you can .unsqueeze(1) to get your desired dimension.

PS: torch.max return a tuple which the second value is the target tensor.

1 Like

thank you very much!