Multiclass segmentation U-net masks format

I saw an example here How make customised dataset for semantic segmentation? of how to do a dataset but I must admit that I still have doubts.

Basically I have PNG image but already labelled with index 0 , 1 , 2, 3 23, It’s a costum pallette that I did so techically I should avoid the mapping as in the example or I’m wrong?

self.mapping = {
            0: 0,
            255: 1              
        }

def mask_to_class(self, mask):
        for k in self.mapping:
            mask[mask==k] = self.mapping[k]
        return mask

Do you have a snippet of a code that can create the class indices in the shape [batch_size, height, width] as you said?

Currently I was trying just for one class to apply the mask = torch.from_numpy(mask) but i’m not sure if it’s ok.

def transforms(self, image, mask):
    #img = img.resize((wsize, baseheight), PIL.Image.ANTIALIAS)

    image = image.resize((64, 64), PIL.Image.NEAREST)
    mask = mask.resize((64, 64), PIL.Image.NEAREST)
    
    mask = np.array(mask)
    
    image = TF.to_tensor(image)
    
    mask = torch.from_numpy(mask) 
    
    
    return [image, mask]

Now the mask is actually between zero and one as it should be (the problem for more
indices I still need to solve it of course) but once I try to train I get this:

ValueError: Target size (torch.Size([1, 64, 64])) must be the same as input size (torch.Size([1, 1, 64, 64]))

I tried to add in the calculation of the loss: target_ = torch.empty(batch_size, 1,64,64) target = target_.to(device)

def calc_loss(pred, target, metrics, bce_weight=0.5):
    target_ = torch.empty(batch_size, 1,64,64)
    target = target_.to(device)
    bce = F.binary_cross_entropy_with_logits(pred, target)
        
    pred = F.sigmoid(pred)
    dice = dice_loss(pred, target)

This does not crash but I have Nan as output in the loss. So yes I guess i’m quite confused and trying to read as much but cannot figure out a lot at the moment so whatever help would be great :slight_smile: thanks again!