I’m practicing a semantic segmentation task.
As there are two class for each pixel, the output of the model is of shape(1, 2, 512, 512). See there are two channels, but i need to convert it into shape(1, 1, 512, 512), to match the shape of my label. Wondering what method should i take to complete this transformation?
Hello,
I think shape of output of the network is depending on which loss function employed.
In binary classification, you can use both CELoss
and BCELoss
.
For CELoss
, the shape of the output should be [batch_size, nb_classes, H, W]
then produce a probability map by softmax
.
For BCELoss
, the shape should be [batch_size, H, W]
which is the same to label, and it should incorperate with sigmoid
.
Note: CELoss
has contain logsoftmax
, so you could only pass the model output and label.
BCELossWithLogits
= BCELoss
+sigmoid
for more details
How to get the ultimate output tensor ,i.e. (1, 2, 512, 512) to transform into the ground truth shape (1, 1, 512, 512)? I know how to use CELoss and BCELoss
Have you solved the problem?
Hi,
@Mellow
This line worked for me.
_, pred = torch.max(scores, dim=1)
https://pytorch.org/docs/stable/torch.html#torch.max
Where scores
is your ultimate tensor containing probs.
For instance, scores
tensor has size of [10, 150, 256, 256]
which means I have 150 classes to segment and using above code give me [10, 256, 256]
tensor. Then you can .unsqueeze(1)
to get your desired dimension.
PS: torch.max
return a tuple which the second value is the target tensor.
thank you very much!