I basically have two masks but I do not know how to prepare it for a semantic segmentation model like DeepLab and U-Net.It has 5 classes (not including the background)
Is there a Pytorch function to transform the mask into something readily digestible by the model?
My model output is [batcth_size, n_channels, height, width]. What strategy should I use here? Should I create an n_chanbnel=n_classes set of masks with True False per pixel? False being the mask. Keep in mind that there are only few instances of multiple classes in the image. Usually, it only has one class at a time.. Should I use 1 channel with class numbers instead?
In line with strategy on #2, what should be the loss function and output activation layer?
if you are using PIL in the getitem function in your Dataset class, you just have to open your mask image that has all the class masks with mode “P.” I am not sure what the advantage of using one-hot encoding-like mask image. I think its better to have one image with all labels drawn in it.
Here is my code that makes mask images.
Loss function should be fine using BCE or categorical cross entropy.
to answer your first question, your should transform your image to tensor and also perform normalization.
Is there a Pytorch function to transform the mask into something readily digestible by the model?
def make_palatte (categories):
plt_dict = {0: ["backgorund", (0,0,0)]}
all_colors_map = []
palette = [0,0,0]
for i in tqdm(range(255)):
for ii in range(255):
for iii in range(255):
adding = [i, ii, +iii]
all_colors_map.append(adding)
random.shuffle(all_colors_map)
distance = len(all_colors_map)/(len(categories)+10) # 10>> buffer
distance = math.floor(distance)
for idx, one_categ in enumerate(categories):
id = idx+1
name = one_categ # word
color = all_colors_map[(idx+1)*distance]
palette.extend(color)
plt_dict[id] = [name, tuple(color)]
return plt_dict, palette
def mask_maker (palette_dict, img_id, height, width, palette, annotation, export_file_path):
im = Image.new("P", (width, height), color=(0,0,0))
im.putpalette(palette)
d = ImageDraw.Draw(im)
if len(annotation) == 0:
im.save(export_file_path)
return
#sort
temp_non_label = []
new_anno= []
for anno in annotation:
if anno["tags"] == "label":
new_anno.append(anno)
else:
#temp_non_label.append(anno)
pass
new_anno.extend(temp_non_label)
for region in new_anno:
category_id = region["tags"]
for a in palette_dict:
if palette_dict[a][0] == category_id:
category_index = a
break
xy_tup_list = []
for point in region["points"]:
x = point["x"]
y = point["y"]
xy_tup_list.append((x, y))
d.polygon(xy_tup_list, fill=category_index)
#print(category_id, "\n", export_dir)
#change to however you have your annotation.
#plt.imshow(im)
#plt.show()
#print(im.getcolors())
#print(im.getpalette())
im.save(export_file_path)
mask maker makes color png mask imag like the one you have with multiple classes it takes in annotation infomation as input containg polygon coordinates and label names, like “car” and [“x”:10, “y”:20],[“x”:23, “y”:33],[“x”:1, “y”:5]. to make mode P image, you first have to make pallete, thats what the first function is for.
Oh, I completely misunderstood you then. Then I guess you can just open the image with PIL with mode “P” and transform it into a tensor. your getitem function in Dataset class looks something like this.
since i have an n_channel mask of 5 (5 classes), what should the loss function be? also the activation? It seems that my only one channel in my model learns and all segmentations are on this channel