I am trying to finetune the fcn_resnet101 segmentation model with my own dataset, and I am currently getting stuck in the step where I need to convert RGB masked images to ones that contain the class index for the pixel.
My input mask is a RGB Image that has two colors for each class( i.e. black for background, blue for car).
I adapted the code I found in this post to the following:
def mask_to_class(self, mask):
#target = torch.from_numpy(mask)
target = mask
h,w = target.shape[0],target.shape[1]
masks = torch.empty(h, w, dtype=torch.long)
colors = torch.unique(target.view(-1,target.size(2)),dim=0).numpy()
#print("colors: " + colors)
print("len(colors): " + str(len(colors)))
target = target.permute(2, 0, 1).contiguous()
mapping = {tuple(c): t for c, t in zip(colors.tolist(), range(len(colors)))}
#print("mapping: " + str(mapping))
for k in mapping:
print("k: " + str(k))
idx = (target==torch.tensor(k, dtype=torch.uint8).unsqueeze(1).unsqueeze(2))
validx = (idx.sum(0) == 3)
masks[validx] = torch.tensor(mapping[k], dtype=torch.long)
return masks
To be honest, I don’t fully understand the code above, but I assume from the post and the name of the function that purpose of the mask_to_class
is to convert the RGB masks to masks that contain the class index instead of the RGB value.
I then call that function in the _getitem_
function below:
def __getitem__(self, index):
image = Image.open(self.image_paths + self.image_dirs[index])
mask = Image.open(self.target_paths + self.target_dirs[index])
image, mask = self.transform(image, mask)
mask = self.mask_to_class(mask)
return image, mask
And my transform function looks like this:
def transform(self, image, mask):
# Random horizontal flipping
if random.random() > 0.5:
image = TF.hflip(image)
mask = TF.hflip(mask)
# Random vertical flipping
if random.random() > 0.5:
image = TF.vflip(image)
mask = TF.vflip(mask)
# Transform to tensor
image = TF.to_tensor(image)
mask = TF.to_tensor(mask)
#Normalize? Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
image = TF.normalize(image, [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
return image, mask
But when I run the code I get this error:
File "/home/info/backend/MyDataset.py", line 142, in __getitem__
mask = self.mask_to_class(mask)
File "/home/info/backend/MyDataset.py", line 128, in mask_to_class
idx = (target==torch.tensor(k, dtype=torch.uint8).unsqueeze(1).unsqueeze(2))
RuntimeError: Expected object of scalar type Float but got scalar type Byte for argument #2 'other'
I appreciate your help on this.
Also, something weird that I noticed while trying to debug this is that when I printed put len(colors)
I got this when I was expecting len(colors) to be 2 since there are only two colors in the mask:
len(colors): 529
len(colors): 711
len(colors): 715
len(colors): 775