How to combine separate annotations for multiclass semantic segmentation?

I have a binary semantic segmentation model (resnet16 + unet) which works out well but I got additional annotations so it won’t be binary anymore. I have an image and 3 separate annotation masks (.png not .json) -

How can I combine and format the masks so I can pass them to the model?

You would need to transform the annotation masks to class indices by e.g. mapping the color codes to class indices, depending on your current mask format. The linked image seems to show a binary mask, so I’m unsure if you are using 3 binary masks now or a single mask with different colors/values.

1 Like

Thanks for the help! At first, I had just 1 binary mask. I got 2 more and I want to combine/format all 3 into a single mask that can be used as ground truth (so I can segment more things with just one model). The training image has a shape of (3, 224, 224) and each mask has a shape of (1, 224, 224). What should be the shape of the combined formatted mask and what values should it contain? How can I achieve that?

In that case you could map each mask to a class index assuming that the targets do not overlap and you are thus working on a multi-class segmentation use case as seen here:

# Create non-overlapping masks
mask1 = torch.zeros(1, 24, 24).long()
mask1[:, :2, :2] = 1

mask2 = torch.zeros(1, 24, 24).long()
mask2[:, 3:6, 3:6] = 1

mask3 = torch.zeros(1, 24, 24).long()
mask3[:, 15:18, 15:18] = 1

# map masks to class indices
mask2[mask2==1] = 2
mask3[mask3==1] = 3

# create target mask
mask = mask1 + mask2 + mask3

> tensor([0, 1, 2, 3])
1 Like

Thanks, I will give it a shot! Although I guess the different targets do overlap since we have three classes: Roof Edge, Roof Corner, Roof Footprint. Some of them do share a part of the same pixels since corners are contained within edges and everything is contained inside the footprint. Will there be issues with your current proposed approach? What is another possible approach to overcome overlapping targets?

Tried to merge them to visualise them:

I guess I can make 3 separate binary segmentation models but I am pretty sure that it’s not better.

My proposed approach wouldn’t work in this case, since the addition would create new classes.
E.g. if a specific pixel location contains a class label 1 and 2, the result would be class index 3, which would be a new class representing the occurrence of both classes.
This could be of course a valid approach, in case you want to create new class indices for the overlaps.

On the other hand, you could also use a multi-label segmentation, where the output might contain zero, one, or multiple active classes for each pixel. In that case you would use nn.BCEWithLogitsLoss as the criterion and the target would be a multi-hot encoded mask, which can be created by stacking the binary maps.

You could also stick to the multi-class segmentation use case, but would need to make sure to only keep one active class for the overlaps.

1 Like