Issues loading and padding semantic annotated images in pytorch

Hi all

First post, I am a novice getting my feet wet with vision applications.

I am trying to do a multi class image segmentation using UNET. I have a well controlled scene with 4 classes including background. My images are very large and can vary in size so I shrink them down to 512x512 squares before processing as part of my preprocessing transforms. (I think this is where things go wrong.) I do preserve the aspect ratio and so I pad the shorter side.

I label the images in labelme and convert the json to png formant. I load them in pytorch with Pillow including the convert L option even though they are rgb with the colors.

I suspect the small values float values in my label file is the reason why I get a prediction of everything being background. I am not sure how to do this properly.

How should I be doing this? Any help or advice would be much appreciated.

My mask transforms is the following

def pad_mask_with_aspect(x):
    c,h,w = x.shape
    h_new = int(IMG_SIZE*h/w)
    pad_amount = int((IMG_SIZE-h_new)//2)
    square_image = transforms.Compose([
                                        transforms.Pad((0,pad_amount),fill=255, padding_mode='constant'),
    return square_image(x)

 mask_transform = transforms.Compose([
        transforms.Lambda(lambda x : pad_mask_with_aspect(x)),


listing out the unique values of the tensors yields a way to long list

[0.00000000e+00 1.61301869e-05 3.39231308e-04 3.39586608e-04
 5.01137169e-04 1.69777311e-03 4.07475512e-03 4.12932783e-03
 5.09344367e-03 6.40318636e-03 6.40318682e-03 8.17018375e-03
 8.17018468e-03 1.09038642e-02 1.22726383e-02 1.22867087e-02
 1.24152694e-02 1.35340076e-02 1.45362820e-02 1.80453453e-02
 2.06966624e-02 2.06966642e-02 2.07012109e-02 2.08105519e-02
 2.16495525e-02 2.45127957e-02 2.45127976e-02 2.61451844e-02
 2.67280024e-02 2.79592983e-02 2.85708942e-02 2.86376886e-02
 2.89735086e-02 2.89735105e-02 2.89780572e-02 2.91824918e-02
 3.04137859e-02 3.17726135e-02 3.31801474e-02 3.32595184e-02
 3.42347920e-02 3.42982486e-02 3.45858186e-02 3.51697020e-02
 3.72549035e-02 3.80758382e-02 3.85971367e-02 4.07008678e-02
 4.24938761e-02 4.36097533e-02 4.37975042e-02 4.47742157e-02
 4.48223054e-02 4.49678339e-02 4.51908112e-02 4.59865220e-02
 4.71507385e-02 5.07869944e-02 5.11114448e-02 5.27576469e-02
 5.30091301e-02 5.35032935e-02 5.38131446e-02 5.53002469e-02
 5.64644635e-02 5.71890436e-02 5.71890473e-02 5.87928928e-02
 5.87928966e-02 5.99571094e-02 6.02331720e-02 6.11213259e-02
 6.11213297e-02 6.20854422e-02 6.20899908e-02 6.51400909e-02
 6.98983520e-02 7.03622922e-02 7.15992674e-02 7.17801824e-02
 7.26206154e-02 7.27634802e-02 7.27634877e-02 7.39277005e-02
 7.43655786e-02 7.86391348e-02 7.86482319e-02 8.00074935e-02
 8.07626992e-02 8.23494643e-02 8.32414255e-02 8.50510970e-02
 8.55698586e-02 8.61051381e-02 8.66542757e-02 8.69159847e-02
 8.69250745e-02 8.69250819e-02 8.98697823e-02 8.98697898e-02
 9.12535116e-02 9.28289369e-02 9.52064693e-02 9.52561796e-02
 9.60477963e-02 9.60478038e-02 9.72120166e-02 9.95404422e-02
 1.00704663e-01 1.03483319e-01 1.03487864e-01 1.06210150e-01
 1.06210157e-01 1.08773753e-01 1.10975251e-01 1.11764707e-01
 1.12346813e-01 1.13511041e-01 1.13782331e-01 1.14675246e-01
 1.15299284e-01 1.17408089e-01 1.20041549e-01 1.20041557e-01
 1.22809075e-01 1.26317412e-01 1.27481624e-01 1.28322944e-01
 1.28322959e-01 1.28645837e-01 1.29592612e-01 1.29631117e-01
 1.30825192e-01 1.35193124e-01 1.36485681e-01 1.36599794e-01
 1.38196826e-01 1.38525963e-01 1.38888642e-01 1.38888657e-01
 1.40287995e-01 1.40369043e-01 1.41452208e-01 1.41452223e-01
 1.42616421e-01 1.44184917e-01 1.44881189e-01 1.45017907e-01
 1.45964295e-01 1.46287143e-01 1.49019599e-01 1.49019614e-01
 1.49019629e-01 1.52260914e-01 1.87907502e-01 1.87907517e-01
 2.18003228e-01 2.20588237e-01 2.20588252e-01 2.36928612e-01
 2.36928627e-01 2.69609332e-01 2.69609362e-01 2.93064505e-01
 2.94117630e-01 2.94117659e-01 2.94117689e-01 2.55000000e+02]

OK worked un a bit more and found that part of my issue is that conversion to a torch tensor scales everything to [0-1]. I can get around this by making a lambda transform where I cast as a numpy array with dtype int64 and change that to a tensor.

so now my mask shows up correctly but…

I still get a slew of other values when I finish my transforms! Further the rest of my code is now hanging…

Now I am really stumped…

Before my padding function

####torch.unique output
tensor([ 0, 38, 75])
#printing the tensor
tensor([[[75, 75, 75,  ..., 75, 75, 75],
         [75, 75, 75,  ..., 75, 75, 75],
         [75, 75, 75,  ..., 75, 75, 75],
         [ 0,  0,  0,  ...,  0,  0,  0],
         [ 0,  0,  0,  ...,  0,  0,  0],
         [ 0,  0,  0,  ...,  0,  0,  0]]])


####torch.unique output
tensor([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,  10,  11,  12,  13,
         14,  15,  16,  17,  18,  19,  20,  21,  22,  23,  24,  25,  26,  27,
         28,  29,  30,  31,  32,  33,  34,  35,  36,  37,  38,  56,  59,  67,
         69,  75, 255])

#printing the tensor
tensor([[[255, 255, 255,  ..., 255, 255, 255],
         [255, 255, 255,  ..., 255, 255, 255],
         [255, 255, 255,  ..., 255, 255, 255],
         [255, 255, 255,  ..., 255, 255, 255],
         [255, 255, 255,  ..., 255, 255, 255],
         [255, 255, 255,  ..., 255, 255, 255]]])

my transforms

 mask_transform = transforms.Compose([
        transforms.Lambda(lambda x : tensor_to_numpy_preserve_scale(x)),
        transforms.Lambda(lambda x : pad_mask_with_aspect(x)),

def tensor_to_numpy_preserve_scale(x):
    return transforms.ToTensor()(np.array(x,dtype='int64'))

def pad_mask_with_aspect(x):
    c,h,w = x.shape
    h_new = int(IMG_SIZE*h/w)
    pad_amount = int((IMG_SIZE-h_new)//2)
    square_image = transforms.Compose([
                                        transforms.Pad((0,pad_amount),fill=255, padding_mode='constant'),
    temp = square_image(x)
    return square_image(x)

You are padding the mask with fill=255 so the newly added values are expected. Change it to e.g. 0 if you want to pad it with this value.

Also, I would recommend to use InterpolationMode.NEAREST in Resize on the mask since BILINEAR is used by default and could also change your values.

Thanks for the tip. I was able to side step the issue by not doing another resize after padding. I still need to try the interpolation mode to see if that would fix the issue but now I have something more pressing.

Now that I have the label matrix I am trying to turn it back to an RGB image to view it. I made small dictionary and I have some ugly code that gets me into a C,W,H dimensions. The problem is my colors are in scale [0-255] but pillow only takes [0,1].

@ptrblck I saw a reply regarding normalization but I do not think it is working in my case. Most of my colors turn white when I convert to pill which are not the colors I picked so something with the conversion is wrong. Any ideas?

See my inefficient method of going from label to color below. I would also welcome some tips to make this snippet more efficient. Maybe drop the inner loop somehow…

label2color = {0:[21, 21, 21],1:[244, 232, 221],2:[75, 156, 211],3:[19, 41, 75] }
#0 Background 1 Bottom Platen 3 Top Platen 2 Sample
idx = (pred_labels == 0).nonzero(as_tuple=False)
label_image = torch.zeros((3,IMG_SIZE,IMG_SIZE), dtype=torch.float)
for key in label2color:
    idx = (pred_labels == key).nonzero(as_tuple=False)
    print(f'key = {key} idx= {idx.shape[0]}')
    for i in range(idx.shape[0]):
        label_image[0,idx[i][1].item(),idx[i][2].item()] = label2color[key][0]
        label_image[1,idx[i][1].item(),idx[i][2].item()] = label2color[key][1]
        label_image[2,idx[i][1].item(),idx[i][2].item()] = label2color[key][2]
label_image -= label_image.min(1, keepdim=True)[0]
label_image /= label_image.max(1, keepdim=True)[0]

pred_image = transforms.ToPILImage()(label_image).convert("RGB")

wonky labels

PIL should take uint8 arrays with values in [0, 255], so you should be able to cast the dtype and pass it to PIL.

1 Like

@ptrblck OK great that worked. I can see the predicted image.

I had two follow ups…

  1. I am not sure how to handle padding. I add it to make my image square for ease of use with models. I was setting it to 255 and ignoring that index on the loss function but I still get a prediction on it. I was hoping it would leave those labeled pixels unchanged. Since the model did take a guess I suspect it is messing up the identification. See the example below. How does one handle this?

  2. The model seems great at outlining the silhouette but it is calling the entire block, one group when it really should be 3. How can I get the model to start recognizing it despite the fact that the pixel colors will be similar in the silhouette. I was hoping features about the shape of the silhouette would help in splitting it. Is UNET not great for such purposes. I am training from nothing and I saw you could use weights from other model to help. Also I am doing this on my laptop so my batch size has been 1, which my understanding could impact the Loss function calculations a great deal.

Any tips on how to go forward would be great.

  1. If you are padding the input withe.g. 255 and force the criterion to ignore this class index, I think your model should not learn anything to predict these outputs and they might be random or the “default” class prediction. You could try to crop the output of the model back to the original spatial size and calculate the loss on it, but this should yield the same outcome.

  2. I’m not sure I understand the description completely. Is your model supposed to predict other classes “inside” the already predicted class? If so, I assume the amount of class pixels could be imbalanced and your model fails to predict the minority classes? In that case you could try to use e.g. focal loss, which could counter this imbalance.

@ptrblck I would not describe that as subclasses, however there is nothing separating the bottom (in red) from the sample (in green) in the image. I was thinking the top ( yellow) would be easy to see since it is a separate shape. So the model would have to maybe leverage the different in widths ( my hope!), in the raw image they are both the same color. Basically the model is calling everything one class instead of the three in the label. It seems there was no attempt to use all three classes and the model picked one class and called every class it could identify this dominant class. I am trying to fix that. I am going to try and tune the model, just not sure where to start.