Ground-truth label normalization

(Keyur Paralkar) #1

I am currently working on pascal voc 2012 dataset. I am a bit confused regarding the Normalization of images. Do I need to normalize the ground-truth labels, if I am normalizing my input images?
For experimentation, I tried to normalize the ground-truth labels for the corresponding input images using transforms.Normalize() function. When I try to train my network I get an error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-12-416ca5ee56ac> in <module>()
     10 model2 =
---> 12 m1,loss_acc_dict = train_model(model=model2,criterion=criterion,optimizer=optimizer,num_epochs=175)

<ipython-input-10-1744505e89c5> in train_model(model, criterion, optimizer, scheduler, num_epochs)
     38             print(train_y)
     39             loss = criterion(outputs,train_y)
---> 40             epoch_loss += float(loss)
     42             #calculating the accuracy:

RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535491974311/work/aten/src/THC/generic/THCStorage.cpp:36

Here is my code for train helper function:

#Create a function for training a model.
def train_model(model, criterion, optimizer, scheduler=None, num_epochs=10):
#     since = time.time() #Start the timer
    model.train() #Put in train mode.
    total_acc = []
    total_loss = []
    total_iou = []
    for epochs in range(num_epochs):
        epoch_loss = 0.0
        epoch_acc = 0
        epoch_iou = 0
        print("Epoch {}/{}".format(epochs,num_epochs-1))
        iteration = 0
        total_iteration = train_images_dataloader.__len__()
        for batch, data in enumerate(train_images_dataloader):
            #Keeping count of currect iteration
            iteration += 1
            #loading the data on the GPU:
            train_x, train_y = data['image_x'].to(device), data['image_y'].to(device)
            #zero the parameter gradients:
            #forward pass:
            outputs = model(train_x)
            #compute loss:
            train_y = train_y.squeeze().long()
            loss = criterion(outputs,train_y)
            epoch_loss += loss
            #calculating the accuracy:
            _, predicted_indices = torch.max(outputs,1)
            total = train_y.size(0)*train_y.size(1)*train_y.size(2)
            correct = (predicted_indices == train_y).sum().item()
            acc = (correct/total)
            epoch_acc += float(acc)
            m_iou = compute_iou(predicted_indices,train_y)
            epoch_iou +=m_iou
            print('Iteration = {}/{} Loss = {} Accuracy = {}, IOU = {}'.format(iteration,total_iteration,loss.item(),acc,m_iou),end="\r")
            #update the weights:
            #Memory management:
            del outputs,train_x,train_y
        writer.add_scalar('Train/Mean IOU',(epoch_iou/total_iteration),global_step=epochs)
    print('Total accuracy = ',sum(total_acc)/num_epochs)
    print('Total Loss = ',sum(total_loss)/num_epochs)
    print('Total mean IOU = ',sum(total_iou)/num_epochs)
    return model,{'Accuracy':total_acc,'Loss':total_loss,'mIOU':total_iou} #divide by num_opochs and then sotre in dict.

Custom dataset loader:

#Creating custom dataloader fro PASCAL VOC 2012

__init__ : class variables such as, images_path, transform etc.
__len__ : would return len of the entire dataset
__getitem__ : returns ith image from the dataset

class PascalVocDataset(Dataset):
    def __init__(self,anno_path,ip_images_path,seg_images_path,transform=None):
        anno_path = path to train.txt file for image identifers
        ip_images_path = path to input image folders
        seg_images_path = path to segmentation maps class folder
        self.anno_path = anno_path
        self.ip_images_path = ip_images_path
        self.seg_images_path = seg_images_path
        self.transform = transform
    def getImagesNames(self,path):
        #Returns image identifiers from file_name
        #path = .txt location
        images_names = []
        with open(path) as fp:
            images_names = list(map((lambda x: x[:-1]),fp.readlines()))
        return images_names
    def __len__(self):
        return len(self.getImagesNames(self.anno_path))
    def __getitem__(self, idx):
        file_names = self.getImagesNames(self.anno_path)
        file_path_x = self.ip_images_path+file_names[idx]+'.jpg'
        file_path_y = self.seg_images_path+file_names[idx]+'.png'
        #X images:
        im =
        image_x = im.convert('RGB')
        #Y images:
        img =
#         img = img.resize((224,224),Image.ANTIALIAS)
#         image_y = torch.tensor(np.array(img),dtype=torch.long)
        sample = {'image_x':image_x,'image_y':img}
        if self.transform:
            sample['image_x'] = self.transform(sample['image_x'])
            sample['image_y'] = self.transform(sample['image_y'])

        return sample

(Arul) #2

I assume that you are talking about PASCAL VOC 2012 segmentation dataset.
In my view, it depends on how you are calculating the loss. When you are using CrossEntropyLoss or NLLLoss2d (per pixel), you should not normalize the ground truth.

What kind of loss function are you using?

(Keyur Paralkar) #3

I am using crossentropyloss. So I should just normalize the inputs ?. If I just normalize the input wouldn’t it produce greater loss as the range of values are different now for input and ground truth ?

(Arul) #4

Yes. you should just normalize the inputs in my view.
The given ground truths are not images with ‘pixel values’, but they are labels that say which class the particular pixel belongs to. I am not seeing the reason why you need to normalize the labels.

(Keyur Paralkar) #5

This makes sense So normalizing ground-truth labels completely depends upon the problem we are working on, right ?

(Keyur Paralkar) #6

I was trying to normalize the train set because I was getting blank predictions after training. That’s the reason I thought I should normalize ground truth labels as well


That might be a problem of your model, if it only outputs one laben (e.g. the background class), or a visualization issue.
If you call torch.argmax on your output, you will get the most likely classes as indices.
Some libraries like matplotlib try to plot these “images” as uint8 in the range [0, 255].
As you class labels might be small, e.g. [0, 5] you might see only a dark image.
Try to convert your labels to colors using a colormap in this case.

(Keyur Paralkar) #8

This is my method of replacing indicies with higher pixel intensity values:

#For generating colormap of given image:
def replace_pixels(img):

    labelsTC = [x for x in range(200,246)]
    temp_mask = np.zeros((img.shape),


      traverse through the given array img array. and compare each point like below

      if point >= 0 && point<=20:

        temp_mask[i] = labelsTC[point]


    for i in range(img.shape[0]):
        for j in range(img.shape[1]):
                if(img[i,j]>=1 and img[i,j]<=20):
                    temp_mask[i,j] = labelsTC[img[i,j]]

    return temp_mask

And these are the results that I get after visualisation:
Image 1, 2, 3 are the Input image, predictions (segmentation maps), ground-truth labels respectively.


So your predictions are not blank but just look wrong.
Do you just want to predict the edges or the classes as well?
labelsTC seems to be some kind of color map with a small range. Did you try to use a colormap from e.g. matplotlib to spread the colors a bit?

(Keyur Paralkar) #10

I want to obtain segmentation maps atleast equivalent to the ground-truth, also I am using over here is FCN-32 model. No I have not tried to use colormap from matplotlib to spread the colors can you provide any example of how to do it ? Sorry for my limited knowledge

(Keyur Paralkar) #11

Yes @ptrblck labelsTC is a color map with small range.

(Keyur Paralkar) #12

After changing color map to ‘Paired’ I got follow segmentation maps:

For some cases like detecting boat in an image and generating its segmentation map, my current model completely fails at it and gives blank predictions. But for some examples like displayed above gives better results. Does this mean for better accuracy than this I should use FCN-16 or FCN-8 architecture ?


It’s hard to answer which architecture will definitely work better.
Currently your model seems to fail at the segmentation task.
However, the model might still be a good fit, but the hyper-parameters, e.g. learning rate, are messing up the training.
Have you checked your per-class accuracies? Most likely some classes will be completely ignored.

(Keyur Paralkar) #14

How can I calculate pre-class accuracies? using confusion matrix?
Also this is my implementation of FCN-32 model, is this the correct implementation:

(Arul) #15

Why did you decide not to use any non-linearity in your decoder part?
Is FCN not using it?

(Keyur Paralkar) #16

I haven’t used any non-linearity in fcn because it gave me better results i.e. accuracy but not that much as I expected. Also I thought RelU might be causing my values to diminish.

(Keyur Paralkar) #17

I have read from some of the discussions from the forums of pytorch, that the current implementation of vgg16 of pytorch doesn’t give better performance as compared to actual Oxford original vgg16. I will try to implement and copy the weights of original vgg16 or will try to implement this via. Resnet.