Memory increase while reading batches

MANSUM · May 11, 2018, 7:56pm

I am having a trouble with increasing memory issue.

Every time I go on with a batch, memory is increasing and eventually reaches out of memory.

I am sure that it is not related to the training process, because even if I comment out the training part, the memory keeps on accumulating by only reading the batches.

Below is my code snippet.

data is a list of list in the following format.
[[1,2,3],[4,5,6]...]

for epoch in range(numEpoch):
    filename = 'somefile.pkl'
    
    data = pkl.load(open(filename))           
    train_by_dataloader = Dataset_triplet(data)
    train_loader = DataLoader(dataset=train_by_dataloader, batch_size=self.batch_size, shuffle=True)
    
    
    for batch_idx, batch in enumerate(train_loader):
        u = Variable(batch['u'])
        i = Variable(batch['i'])
        j = Variable(batch['j'])


        ...memory keeps on accumulating...

and my DataLoader is

class Dataset_triplet(Dataset):
    def __init__(self, totalData):
        self.totalData = totalData
        
    def __len__(self):
        return len(self.totalData)
    
    def __getitem__(self, idx):
        result = {'u':self.totalData[idx,0],'i':self.totalData[idx,1],'j':self.totalData[idx,2]}
        return result

I am guessing that my DataLoader has some problem…

can anyone help me out?

rasbt · May 11, 2018, 8:12pm

Hm, I can’t see an issue right now, but for debugging purposes, have you tried to just iterate over the tensors without casting them to Variables?

ptrblck · May 11, 2018, 8:56pm

Can you post the missing code please?
Are you storing somehow the loss of your calculations?

MANSUM · May 12, 2018, 1:58am

Even if I did, that shouldn’t be a problem for now because the memory increases even without that part.

Anyways, the rest of the training code looks like this

def training(self):
    model = modeler(num1, num2, K)
    if self.cuda_available == True: 
        model = model.cuda(self.cuda)
        
    criterion = torch.nn.MarginRankingLoss(margin=self.margin)
    optimizer = optim.SGD(model.parameters(), lr = self.lRate)

    for epoch in range(numEpoch):
        filename = 'somefile.pkl'
    
        data = pkl.load(open(filename))           
        train_by_dataloader = Dataset_triplet(data)
        train_loader = DataLoader(dataset=train_by_dataloader, batch_size=self.batch_size, shuffle=True)
    
    
        for batch_idx, batch in enumerate(train_loader):
            u = Variable(batch['u'])
            i = Variable(batch['i'])
            j = Variable(batch['j'])
            optimizer.zero_grad()
    
            pos, reg = model(u, i)
            neg, _ = model(u, j)
    
         if self.cuda_available == True:
             loss = criterion(pos, neg, Variable(torch.FloatTensor([-1])).cuda(self.cuda))
         else:
             loss = criterion(pos, neg, Variable(torch.FloatTensor([-1])))
    
         loss += self.reg1 * reg
         loss.backward()
         optimizer.step()
    
    
         totalLoss += loss.item()

MANSUM · May 12, 2018, 2:01am

I tried that but the memory still increases.

Actually, even if I comment out

u = Variable(batch['u'])
i = Variable(batch['i'])
j = Variable(batch['j'])

the above code, the memory still increases.

This makes me pretty sure that the problem comes from DataLoader

rasbt · May 12, 2018, 2:30am

hm, and you are really sure that it’s increasing in the for-loop over the batches and not the for-loop over the epochs? Because the only thing I could see right now is that you are not closing the pickle file object (not sure if that’s necessary still, but personally, I always use the “with” context managers when dealing with files)

MANSUM · May 12, 2018, 2:41am

I tried to open with the with context and it doesn’t work.

I am sure that the problem happens in the for loop of batches (I checked the memory using by using psutil.)

rasbt · May 12, 2018, 4:21am

Hm, so it’s the regular RAM not the GPU memory? I had some issues with GPU memory & the batch loader when runs got aborted (the process kept running in the background, PyTorch doesn’t free GPU’s memory of it gets aborted due to out-of-memory error). It’s probably not related since you mention the memory increases during the run. Sorry, I have no idea what could cause that in your case.