Memory increase while reading batches

I am having a trouble with increasing memory issue.

Every time I go on with a batch, memory is increasing and eventually reaches out of memory.

I am sure that it is not related to the training process, because even if I comment out the training part, the memory keeps on accumulating by only reading the batches.

Below is my code snippet.

data is a list of list in the following format.
[[1,2,3],[4,5,6]...]

for epoch in range(numEpoch):
    filename = 'somefile.pkl'
    
    data = pkl.load(open(filename))           
    train_by_dataloader = Dataset_triplet(data)
    train_loader = DataLoader(dataset=train_by_dataloader, batch_size=self.batch_size, shuffle=True)
    
    
    for batch_idx, batch in enumerate(train_loader):
        u = Variable(batch['u'])
        i = Variable(batch['i'])
        j = Variable(batch['j'])


        ...memory keeps on accumulating...

and my DataLoader is

class Dataset_triplet(Dataset):
    def __init__(self, totalData):
        self.totalData = totalData
        
    def __len__(self):
        return len(self.totalData)
    
    def __getitem__(self, idx):
        result = {'u':self.totalData[idx,0],'i':self.totalData[idx,1],'j':self.totalData[idx,2]}
        return result

I am guessing that my DataLoader has some problem…

can anyone help me out?

Hm, I can’t see an issue right now, but for debugging purposes, have you tried to just iterate over the tensors without casting them to Variables?

Can you post the missing code please?
Are you storing somehow the loss of your calculations?

Even if I did, that shouldn’t be a problem for now because the memory increases even without that part.

Anyways, the rest of the training code looks like this

def training(self):
    model = modeler(num1, num2, K)
    if self.cuda_available == True: 
        model = model.cuda(self.cuda)
        
    criterion = torch.nn.MarginRankingLoss(margin=self.margin)
    optimizer = optim.SGD(model.parameters(), lr = self.lRate)

    for epoch in range(numEpoch):
        filename = 'somefile.pkl'
    
        data = pkl.load(open(filename))           
        train_by_dataloader = Dataset_triplet(data)
        train_loader = DataLoader(dataset=train_by_dataloader, batch_size=self.batch_size, shuffle=True)
    
    
        for batch_idx, batch in enumerate(train_loader):
            u = Variable(batch['u'])
            i = Variable(batch['i'])
            j = Variable(batch['j'])
            optimizer.zero_grad()
    
            pos, reg = model(u, i)
            neg, _ = model(u, j)
    
         if self.cuda_available == True:
             loss = criterion(pos, neg, Variable(torch.FloatTensor([-1])).cuda(self.cuda))
         else:
             loss = criterion(pos, neg, Variable(torch.FloatTensor([-1])))
    
         loss += self.reg1 * reg
         loss.backward()
         optimizer.step()
    
    
         totalLoss += loss.item()

I tried that but the memory still increases.

Actually, even if I comment out

u = Variable(batch['u'])
i = Variable(batch['i'])
j = Variable(batch['j'])

the above code, the memory still increases.

This makes me pretty sure that the problem comes from DataLoader

hm, and you are really sure that it’s increasing in the for-loop over the batches and not the for-loop over the epochs? Because the only thing I could see right now is that you are not closing the pickle file object (not sure if that’s necessary still, but personally, I always use the “with” context managers when dealing with files)

I tried to open with the with context and it doesn’t work.

I am sure that the problem happens in the for loop of batches (I checked the memory using by using psutil.)

Hm, so it’s the regular RAM not the GPU memory? I had some issues with GPU memory & the batch loader when runs got aborted (the process kept running in the background, PyTorch doesn’t free GPU’s memory of it gets aborted due to out-of-memory error). It’s probably not related since you mention the memory increases during the run. Sorry, I have no idea what could cause that in your case.