How to decode a image from a net for semantic segmentation

I am learning how to semantic segmentation.

The dataset is Pascal VOC2012
When I directly decode a image from the net output,I decode a noisy gray image
This is my loss function:
class CrossEntropyLoss2d(nn.Module):

def __init__(self, weight=None):
    super().__init__()

    self.loss = nn.NLLLoss2d(weight)

def forward(self, outputs, targets):
    return self.loss(F.log_softmax(outputs), targets)

My net output is (batch_size, 22, width , height) , the reason of the number 22 is 22 classes

I do not know the output is what.Is that pixel or the score of the pixel belong to the class?

The output represents the logits of the 22 different classes. You can apply a softmax on the output to get the class probabilities. Each channel represents a probability map for all pixels belonging to the current class. So for example output[0, 7, :, :] will give you a probability map for class 7. To get the most likely classes, you can apply torch.max on the output.

Does this make it clearer or did I miss some points?

Yes ,I know your said,and I apply torch.max ,I got a number map of 0~21(there are 22 classes). But how to transfer this map to a imgae,I do not know

You could try something like this:

num_classes = 22
output = Variable(torch.randn(1, num_classes, 96, 96))
_, pred = torch.max(output, dim=1)

import matplotlib.pyplot as plt

cmap = plt.get_cmap('viridis', num_classes)
seg_arr = pred[0, ...].data.numpy()
plt.imshow(seg_arr, cmap=cmap)

Let me know if this is what you are looking for.

2 Likes