Ram usage increases linearly

Hi, the below code increases the memory usage linearly, and at certain point I am not able to train the model. Surprisingly it is the first time I am facing problem with the following code?


  1. Vector images, Vector image is the only new data that is involved in the following code, commenting line which loads vector images makes the code run normally.

I really have no idea, any hint or suggestion would be highly appreciated.
Thank you,
Nilesh Pandey

class ImgAugTransform:
    def __init__(self):
        sometimes = lambda aug: iaa.Sometimes(0.5, aug)

        self.aug = iaa.Sequential([
            translate_percent={"x":0.2, "y": 0.1},
    def __call__(self, img, img1, img2,img3):
        img = np.array(img)
        img1 = np.array(img1)
        img2 = np.array(img2)
        img3 = np.array(img3)

        return self.aug.augment_image(img), self.aug.augment_image(img1), self.aug.augment_image(img2),self.aug.augment_image(img3)

class Dataset(data.Dataset):
    """Dataset for XXX.
    def __init__(self,height):
        super(Dataset, self).__init__()
        # base setting

        self.files = []
        self.vector = []
        with open("/home/XXX/train.txt", 'r') as f:
            for line in f.readlines():
                im_name, v_name = line.strip().split()
        self.masks = os.listdir("/home/XXX/")
        self.range = np.arange(len(self.masks))
        self.rotate = ImgAugTransform()
    def name(self):
        return "XXX"
    def transformData(self, src, mask, target,ref_lr):

        if random.random() > 0.5:
            src, mask, target,ref_lr = self.rotate(src,mask, target,ref_lr)
        # Transform to tensor

        src = TF.to_tensor(src)
        mask = TF.to_tensor(mask)
        target = TF.to_tensor(target)
        ref_lr= TF.to_tensor(ref_lr)
        src = TF.normalize(src,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        mask = TF.normalize(mask, (0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        target = TF.normalize(target,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        ref_lr = TF.normalize(ref_lr,(0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
        return src, mask, target,ref_lr

    def __getitem__(self, index):
        file = self.files[index]
        mask = self.masks[random.choice(self.range)]
        vector = self.vector[index]
        # person image 
        targ = rescale_intensity(plt.imread(osp.join('/homeXXX/', file))/255)#targ = rescale_intensity(plt.imread(osp.join('/homeXXX', file))/255)
        vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)
        mask = rescale_intensity(plt.imread(osp.join('/home/XXX', 'maskA', mask))/255)

        targ = resize(targ,(256,256))
        vec = resize(vec,(256,256))
        mask = resize(mask,(256,256))
        ms2 = mask*1
        ms2 = np.expand_dims(ms2,axis=2)
        ms2 = np.repeat(ms2,repeats=3,axis=2)
        src =targ*(1-ms2)+ms2

        src = Image.fromarray(np.uint8(src*255))
        mask = Image.fromarray(np.uint8(ms2*255))
        target = Image.fromarray(np.uint8(targ*255))
        vec = Image.fromarray(np.uint8(vec*255))
        source,mask,target,ref = self.transformData(src, mask, target, vec)
        return source,mask,target,ref

I am trying to run all my previous codes and project, and basically all are causing increase in RAM consumption linearly.
The change in hardware setup is low storage space, currently I am on a storage space less than <40GB, as far as I have known it doesn’t make sense, but anyone think it could be related?

If I understand your issue correctly, commenting this line:

vec = rescale_intensity(plt.imread(osp.join('/home/XXX', vector))/255)

yields normal behavior, while keeping it increases the memory?
How did you define rescale_indensity?

Your projects were running fine and now after changing the storage you are noticing an increasing memory usage? Did you change something else (PyTorch version etc.)?

actually, I just cross checked with previous projects, and it is happening with them as well. I doubt it is the code, but something else. I did look in previous post about assigning different variables, not using list, and as such, but it seems it is a different problem.

All the research projects over the last year use the same dataloader, and it is the first time I am having memory issue even with the weak old projects.

Check, if you are accidentally storing the computation graph, e.g. by accumulating the loss without detaching or storing it in a list.
Could you try to create a minimal code snippet to reproduce this issue?

1 Like

Not sure if related, but I am having the same Issue with the mask-rcnn Pytorch version

You’re right, I did debugging step by step and the following line is the culprit, I am directly treating loss term as scalar while calculating average loss over the epoch.

avg_loss_g = (avg_loss_g+loss_G)/(i+1) 

If the problem is not with variables in dataloader, then it is with the variables in main function.

We are talking about CPU Memory, right?

Yes, CPU memory.

This with the @ptrblck response made me understand what are the reasons for potential memory leak in pytorch. I will suggest read thru the links I have posted.