Ram usage increases linearly

Hi, the below code increases the memory usage linearly, and at certain point I am not able to train the model. Surprisingly it is the first time I am facing problem with the following code?

doubts:

  1. Vector images, Vector image is the only new data that is involved in the following code, commenting line which loads vector images makes the code run normally.

I really have no idea, any hint or suggestion would be highly appreciated.
Thank you,
Nilesh Pandey

class ImgAugTransform:
    def __init__(self):
        sometimes = lambda aug: iaa.Sometimes(0.5, aug)

        self.aug = iaa.Sequential([
        iaa.Affine(
            translate_percent={"x":0.2, "y": 0.1},
            rotate=40,
            mode='symmetric'
        )
    ])
    def __call__(self, img, img1, img2,img3):
        img = np.array(img)
        img1 = np.array(img1)
        img2 = np.array(img2)
        img3 = np.array(img3)

        return self.aug.augment_image(img), self.aug.augment_image(img1), self.aug.augment_image(img2),self.aug.augment_image(img3)

class Dataset(data.Dataset):
    """Dataset for XXX.
    """
    def __init__(self,height):
        super(Dataset, self).__init__()
        # base setting

        
        self.files = []
        self.vector = []
        with open("/home/XXX/train.txt", 'r') as f:
            for line in f.readlines():
                im_name, v_name = line.strip().split()
                self.files.append(im_name)
                self.vector.append(v_name)
                
        
        self.masks = os.listdir("/home/XXX/")
        
        self.range = np.arange(len(self.masks))
        self.rotate = ImgAugTransform()
    def name(self):
        return "XXX"
    
    
    def transformData(self, src, mask, target,ref_lr):

        if random.random() > 0.5:
            src, mask, target,ref_lr = self.rotate(src,mask, target,ref_lr)
           
        # Transform to tensor

        src = TF.to_tensor(src)
        mask = TF.to_tensor(mask)
        target = TF.to_tensor(target)
        ref_lr= TF.to_tensor(ref_lr)
        
        src = TF.normalize(src,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        mask = TF.normalize(mask, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        target = TF.normalize(target,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ref_lr = TF.normalize(ref_lr,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        return src, mask, target,ref_lr

    
    def __getitem__(self, index):
        file = self.files[index]
        mask = self.masks[random.choice(self.range)]
        vector = self.vector[index]
        # person image 
        targ = rescale_intensity(plt.imread(osp.join('/homeXXX/', file))/255)#targ = rescale_intensity(plt.imread(osp.join('/homeXXX', file))/255)
        vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)
        mask = rescale_intensity(plt.imread(osp.join('/home/XXX', 'maskA', mask))/255)

        targ = resize(targ,(256,256))
        vec = resize(vec,(256,256))
        mask = resize(mask,(256,256))
        
        ms2 = mask*1
        ms2 = np.expand_dims(ms2,axis=2)
        ms2 = np.repeat(ms2,repeats=3,axis=2)
        
        src =targ*(1-ms2)+ms2

  
        src = Image.fromarray(np.uint8(src*255))
        mask = Image.fromarray(np.uint8(ms2*255))
        target = Image.fromarray(np.uint8(targ*255))
        vec = Image.fromarray(np.uint8(vec*255))
        source,mask,target,ref = self.transformData(src, mask, target, vec)
        return source,mask,target,ref

I am trying to run all my previous codes and project, and basically all are causing increase in RAM consumption linearly.
The change in hardware setup is low storage space, currently I am on a storage space less than <40GB, as far as I have known it doesn’t make sense, but anyone think it could be related?

If I understand your issue correctly, commenting this line:

vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)

yields normal behavior, while keeping it increases the memory?
How did you define rescale_indensity?

@nile649
Your projects were running fine and now after changing the storage you are noticing an increasing memory usage? Did you change something else (PyTorch version etc.)?

actually, I just cross checked with previous projects, and it is happening with them as well. I doubt it is the code, but something else. I did look in previous post about assigning different variables, not using list, and as such, but it seems it is a different problem.

All the research projects over the last year use the same dataloader, and it is the first time I am having memory issue even with the weak old projects.

Check, if you are accidentally storing the computation graph, e.g. by accumulating the loss without detaching or storing it in a list.
Could you try to create a minimal code snippet to reproduce this issue?

1 Like

Not sure if related, but I am having the same Issue with the mask-rcnn Pytorch version

You’re right, I did debugging step by step and the following line is the culprit, I am directly treating loss term as scalar while calculating average loss over the epoch.

avg_loss_g = (avg_loss_g+loss_G)/(i+1) 

If the problem is not with variables in dataloader, then it is with the variables in main function.

We are talking about CPU Memory, right?

Yes, CPU memory.

This with the @ptrblck response made me understand what are the reasons for potential memory leak in pytorch. I will suggest read thru the links I have posted.