How to visualize segmentation output - multiclass feature map to rgb image?

For semantic segmentation outputs how do we visualize the output feature map, a tensor of shape <B x Number_of_Classes x H x W> to <B X 3 X H X W> given a color map palette corresponding to each of the class :

labels = ['unlabeled',	'ego vehicle',	'rectification border',	'out of roi',	'static',	'dynamic',	'ground',	'road',	'sidewalk',	'parking',	'rail track',	'building',	'wall',	'fence',	'guard rail',	'bridge',	'tunnel',	'pole',	'polegroup',	'traffic light',	'traffic sign',	'vegetation',	'terrain',	'sky',	'person',	'rider',	'car',	'truck',	'bus',	'caravan',	'trailer',	'train',	'motorcycle',	'bicycle',	'license plate']
    cityscapes_map = np.array([[0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.        , 0.        , 0.        ],
       [0.07843137, 0.07843137, 0.07843137],
       [0.43529412, 0.29019608, 0.        ],
       [0.31764706, 0.        , 0.31764706],
       [0.50196078, 0.25098039, 0.50196078],
       [0.95686275, 0.1372549 , 0.90980392],
       [0.98039216, 0.66666667, 0.62745098],
       [0.90196078, 0.58823529, 0.54901961],
       [0.2745098 , 0.2745098 , 0.2745098 ],
       [0.4       , 0.4       , 0.61176471],
       [0.74509804, 0.6       , 0.6       ],
       [0.70588235, 0.64705882, 0.70588235],
       [0.58823529, 0.39215686, 0.39215686],
       [0.58823529, 0.47058824, 0.35294118],
       [0.6       , 0.6       , 0.6       ],
       [0.6       , 0.6       , 0.6       ],
       [0.98039216, 0.66666667, 0.11764706],
       [0.8627451 , 0.8627451 , 0.        ],
       [0.41960784, 0.55686275, 0.1372549 ],
       [0.59607843, 0.98431373, 0.59607843],
       [0.2745098 , 0.50980392, 0.70588235],
       [0.8627451 , 0.07843137, 0.23529412],
       [1.        , 0.        , 0.        ],
       [0.        , 0.        , 0.55686275],
       [0.        , 0.        , 0.2745098 ],
       [0.        , 0.23529412, 0.39215686],
       [0.        , 0.        , 0.35294118],
       [0.        , 0.        , 0.43137255],
       [0.        , 0.31372549, 0.39215686],
       [0.        , 0.        , 0.90196078],
       [0.46666667, 0.04313725, 0.1254902 ],
       [0.        , 0.        , 0.55686275]])

In this case output classes = 35 and I have list of 35 class names and a matrix of 35X3 values that I assume are the r,g,b values of pixels for each class.
Output of my final classification layer is <B x 35 x H x W>
It is a conventional image segmentation model with (multiple *conv2d) -> (multiple * convTranspose2d) -> 1x1 convolution (having filters = number of classes)

So naively I could write a function to loop over the 35 filters and for each 2d matrix save those values corresponding to the pixels in the output feature map with corresponding r,g,b values from the palette to a single tensor.
But how do I know which filter index matches from the original class list ?

Is there a better way to do this.

In general how are such outputs visualized ?

1 Like

You could get the class predictions using torch.argmax(output, dim=1) as this will get the argmax of your class dimension.
Then you could use this tensor to index directly into your colormap and visualize the images in a lib like matplotlib:

batch_size = 4
nb_classes = len(labels)
h, w = 96, 96

x = torch.randn(batch_size, nb_classes, h, w)
pred = torch.argmax(x, dim=1)

pred_imgs = [cityscapes_map[p] for p in pred]

for pred_img in pred_imgs:

Thanks a lot, that works perfectly.
though I used : _, pred = torch.max(output, dim=1) from one of your previous answers, does it have a difference from using argmax ?

And you also gave me an answer for this: Index to rgb , tensor casting from given cmap , can close that.

Since its in a training/testing loop matplotlib doesnt make much sense.

Rather I was trying to write the image to file.
using torchvision.utils.save_image() for that. However it gives the following error :

pred_imgs = torch.from_numpy(pred_imgs[0])
from torchvision import utils as u 
u.save_image(pred_imgs, './pred_imgs.png')

Traceback (most recent call last):
  File "/home/saleem/anaconda3/envs/khaturia/lib/python3.5/site-packages/PIL/", line 2460, in fromarray
    mode, rawmode = _fromarray_typemap[typekey]
KeyError: ((1, 1, 150), '|u1')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/saleem/anaconda3/envs/khaturia/lib/python3.5/site-packages/torchvision-0.2.1-py3.5.egg/torchvision/", line 103, in save_image
  File "/home/saleem/anaconda3/envs/khaturia/lib/python3.5/site-packages/PIL/", line 2463, in fromarray
    raise TypeError("Cannot handle this data type")
TypeError: Cannot handle this data type

how to write this image to file ?

had to rotate the tensor :
pred_imgs = pred_imgs.permute(2, 0, 1)

now save_image works. :grinning:

1 Like

Both functions are OK to use in this case, as possible performance differences won’t be visible for such a small workload.

Good to hear it’s working! :wink:

I meet similar question. I wander what is the function of ‘cityscapes_map’?Thanks in advance!

You can find the array in the initial question.
Basically it is a 2D array of rgb values for each class label as provided with dataset.

Can you give more detailed description? I’m confused about the 2D array what you said .

Semantic segmentation works by learning the color as the label for a class.
The 2d array is number_of_classes X 3
That is an unique rgb value for each class.

But since your objective function is a classification loss over multiple classes you can not use an rgb value as the label itself.
Thus you use the index of the same.
The output feature map after your classification layer gives probabilities.
That is every pixel value is the probability if it belongs to that class.
And you have N channels where N = number of classes.
Thus for each class you get a 2d feature map of probabilities.
Thus taking a max gives which pixel belongs to which index.

Then you need to map out these index values back the original corresponding rgb dictionary to visualize.

An analogy would be say a word classifier where you have a fixed vocabulary and the label classified is the index of word in vocab. To infer the actual output from the class probabilities you map it back from your vocab.

Hope this helps !

Hi, does it work the same if I store my color codes in a dictionary and use the same code that you mentioned here?

Sorry another question, is there a reason why my kernel keeps getting dead whenever it runs until this code:

pred_imgs = [cityscapes_map[p] for p in pred.cpu().numpy()]

Somehow, I have having issue visualizing how my predicted output looks like. I am using the same format as @sal with _,predicted = torch.max(output, dim=1).
Thank you

It should also work with a dict instead of a list.
Jupyter notebooks often fail to provide the error message and restart the kernels (if you are using it), so rerun the code in a terminal and check the error message.

sorry how do i rerun in a terminal? haha my bad cuz I have been using juypter notebook all along… Do I need to run something at the bottom of the code with name main along the line or something?

No, you would have to export the code to a *.py script file and execute it in a terminal via python