I am having a trouble with increasing memory issue.
Every time I go on with a batch, memory is increasing and eventually reaches out of memory.
I am sure that it is not related to the training process, because even if I comment out the training part, the memory keeps on accumulating by only reading the batches.
Below is my code snippet.
data is a list of list in the following format. [[1,2,3],[4,5,6]...]
for epoch in range(numEpoch):
filename = 'somefile.pkl'
data = pkl.load(open(filename))
train_by_dataloader = Dataset_triplet(data)
train_loader = DataLoader(dataset=train_by_dataloader, batch_size=self.batch_size, shuffle=True)
for batch_idx, batch in enumerate(train_loader):
u = Variable(batch['u'])
i = Variable(batch['i'])
j = Variable(batch['j'])
...memory keeps on accumulating...
and my DataLoader is
class Dataset_triplet(Dataset):
def __init__(self, totalData):
self.totalData = totalData
def __len__(self):
return len(self.totalData)
def __getitem__(self, idx):
result = {'u':self.totalData[idx,0],'i':self.totalData[idx,1],'j':self.totalData[idx,2]}
return result
I am guessing that my DataLoader has some problem…
hm, and you are really sure that it’s increasing in the for-loop over the batches and not the for-loop over the epochs? Because the only thing I could see right now is that you are not closing the pickle file object (not sure if that’s necessary still, but personally, I always use the “with” context managers when dealing with files)
Hm, so it’s the regular RAM not the GPU memory? I had some issues with GPU memory & the batch loader when runs got aborted (the process kept running in the background, PyTorch doesn’t free GPU’s memory of it gets aborted due to out-of-memory error). It’s probably not related since you mention the memory increases during the run. Sorry, I have no idea what could cause that in your case.