For semantic segmentation outputs how do we visualize the output feature map, a tensor of shape <B x Number_of_Classes x H x W>
to <B X 3 X H X W>
given a color map palette corresponding to each of the class :
labels = ['unlabeled', 'ego vehicle', 'rectification border', 'out of roi', 'static', 'dynamic', 'ground', 'road', 'sidewalk', 'parking', 'rail track', 'building', 'wall', 'fence', 'guard rail', 'bridge', 'tunnel', 'pole', 'polegroup', 'traffic light', 'traffic sign', 'vegetation', 'terrain', 'sky', 'person', 'rider', 'car', 'truck', 'bus', 'caravan', 'trailer', 'train', 'motorcycle', 'bicycle', 'license plate']
cityscapes_map = np.array([[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0. , 0. , 0. ],
[0.07843137, 0.07843137, 0.07843137],
[0.43529412, 0.29019608, 0. ],
[0.31764706, 0. , 0.31764706],
[0.50196078, 0.25098039, 0.50196078],
[0.95686275, 0.1372549 , 0.90980392],
[0.98039216, 0.66666667, 0.62745098],
[0.90196078, 0.58823529, 0.54901961],
[0.2745098 , 0.2745098 , 0.2745098 ],
[0.4 , 0.4 , 0.61176471],
[0.74509804, 0.6 , 0.6 ],
[0.70588235, 0.64705882, 0.70588235],
[0.58823529, 0.39215686, 0.39215686],
[0.58823529, 0.47058824, 0.35294118],
[0.6 , 0.6 , 0.6 ],
[0.6 , 0.6 , 0.6 ],
[0.98039216, 0.66666667, 0.11764706],
[0.8627451 , 0.8627451 , 0. ],
[0.41960784, 0.55686275, 0.1372549 ],
[0.59607843, 0.98431373, 0.59607843],
[0.2745098 , 0.50980392, 0.70588235],
[0.8627451 , 0.07843137, 0.23529412],
[1. , 0. , 0. ],
[0. , 0. , 0.55686275],
[0. , 0. , 0.2745098 ],
[0. , 0.23529412, 0.39215686],
[0. , 0. , 0.35294118],
[0. , 0. , 0.43137255],
[0. , 0.31372549, 0.39215686],
[0. , 0. , 0.90196078],
[0.46666667, 0.04313725, 0.1254902 ],
[0. , 0. , 0.55686275]])
In this case output classes = 35 and I have list of 35 class names and a matrix of 35X3 values that I assume are the r,g,b values of pixels for each class.
Output of my final classification layer is <B x 35 x H x W>
It is a conventional image segmentation model with (multiple *conv2d) -> (multiple * convTranspose2d) -> 1x1 convolution (having filters = number of classes)
So naively I could write a function to loop over the 35 filters and for each 2d matrix save those values corresponding to the pixels in the output feature map with corresponding r,g,b values from the palette to a single tensor.
But how do I know which filter index matches from the original class list ?
Is there a better way to do this.
In general how are such outputs visualized ?