Question about Multi-class Semantic Segmentation

I basically have two masks but I do not know how to prepare it for a semantic segmentation model like DeepLab and U-Net.It has 5 classes (not including the background)

Color Mask


  1. Is there a Pytorch function to transform the mask into something readily digestible by the model?

  2. My model output is [batcth_size, n_channels, height, width]. What strategy should I use here? Should I create an n_chanbnel=n_classes set of masks with True False per pixel? False being the mask. Keep in mind that there are only few instances of multiple classes in the image. Usually, it only has one class at a time.. Should I use 1 channel with class numbers instead?

  3. In line with strategy on #2, what should be the loss function and output activation layer?

if you are using PIL in the getitem function in your Dataset class, you just have to open your mask image that has all the class masks with mode “P.” I am not sure what the advantage of using one-hot encoding-like mask image. I think its better to have one image with all labels drawn in it.
Here is my code that makes mask images.

Loss function should be fine using BCE or categorical cross entropy.
to answer your first question, your should transform your image to tensor and also perform normalization.

  • Is there a Pytorch function to transform the mask into something readily digestible by the model?

def make_palatte (categories):
	plt_dict = {0: ["backgorund", (0,0,0)]} 

	all_colors_map = []
	palette = [0,0,0]

	for i in tqdm(range(255)):
		for ii in range(255):
			for iii in range(255):
				adding = [i, ii, +iii]

	distance = len(all_colors_map)/(len(categories)+10)  # 10>> buffer
	distance = math.floor(distance)
	for idx, one_categ in enumerate(categories):
		id = idx+1
		name = one_categ # word
		color = all_colors_map[(idx+1)*distance]
		plt_dict[id] = [name, tuple(color)]  

	return plt_dict, palette

def mask_maker (palette_dict, img_id, height, width, palette, annotation, export_file_path):
	im ="P", (width, height), color=(0,0,0)) 


	d =  ImageDraw.Draw(im)
	if len(annotation) == 0:
	temp_non_label = []
	new_anno= []
	for anno in annotation:
		if anno["tags"] == "label":

	for region in new_anno: 
		category_id = region["tags"]
		for a in palette_dict: 
			if palette_dict[a][0] == category_id:
				category_index = a 
		xy_tup_list = [] 
		for point in region["points"]:
			x = point["x"]
			y = point["y"]
			xy_tup_list.append((x, y))
		d.polygon(xy_tup_list, fill=category_index) 
		#print(category_id, "\n", export_dir)
#change to however you have your annotation.

your should transform your image to tensor and also perform normalization.
Should i use normalization on tensor?

Do you have a code/guide of Semantic segmentation?

Also, could you clarify the inputs of your function? I’m sorry I dont understand it.

what does mask maker do?

mask maker makes color png mask imag like the one you have with multiple classes it takes in annotation infomation as input containg polygon coordinates and label names, like “car” and [“x”:10, “y”:20],[“x”:23, “y”:33],[“x”:1, “y”:5]. to make mode P image, you first have to make pallete, thats what the first function is for.

I basically dont have the annotations information and only the mask image

Oh, I completely misunderstood you then. Then I guess you can just open the image with PIL with mode “P” and transform it into a tensor. your getitem function in Dataset class looks something like this.

        image_file_path = self.img_list[index]
        raw_img =
        raw_img = raw_img.convert('RGB')  
        anno_file_path = self.mask_list[index]
        anns_img ='P')
        raw_img,anns_img  = self.transform(self.phase, raw_img, anns_img)

torchvision has totensor function.

since i have an n_channel mask of 5 (5 classes), what should the loss function be? also the activation? It seems that my only one channel in my model learns and all segmentations are on this channel