I started using PyTorch recently, and I’m very impressed so far; thank you for making it! I installed PyTorch with the pip install, and am using Python 2.7.6
Original Post, Disregard
I noticed steadily increasing memory usage during training a CNN. Here is my training step:
def training_step(self, trainingdata, traininglabels):
self.optimizer.zero_grad() # zero the gradient buffers
prepared_data = self.prepare_data(trainingdata, (len(trainingdata), self.channels_in, self.height, self.width))
indices = np.nonzero(traininglabels)[1] #turn one-hot labels into indices (how Pytorch does it)
prepared_indicies = self.prepare_data(indices, np.shape(indices), dtype=np.int64) #prepare to go into GPU (and do int64 to get longTensor, which is how Pytorch wants the labels)
output = self(prepared_data).cuda()
error = self.loss(output, prepared_indicies).cuda()
error.backward()
self.optimizer.step() # Does the update
def prepare_data(self, data, shape, dtype = np.float32): #prepare train or val/test data for a pass through the network
reshaped = data.reshape(shape).astype(dtype) #correct shape, correct dtype
reshaped = torch.from_numpy(reshaped) #make torch tensor
return Variable(reshaped.cuda()) #put that in torch var
If I comment out
output = self(prepared_data).cuda()
error = self.loss(output, prepared_indicies).cuda()
error.backward()
self.optimizer.step() # Does the update
the leak does not occur. Therefore, the leak occurs in those lines.
if I add, as suggested by Tracking down a suspected memory leak
torch.backends.cudnn.enabled = False
performance is degraded by approximately 40%, but the leak is less severe (approximately 50MB/epoch instead of 200MB). If I then add
del output
del error
To the end of my training step, the memory leak seems to go almost entirely! (approx 3MB/epoch). However, if I add the del lines, but not the enabled = False line, the leak is still present.
Is this a bug or am I missing some sort of clean-up step in my code? Am I supposed to use the del lines? Is there some kind of work-around that would allow me to get the better performace (present without the enabled = False line) without the memory leak?
Never mind, disregard the above, I recorded some of my tests incorrectly,
torch.backends.cudnn.enabled = False
does completely solve the leak by itself. However, it is about 40% slower. Is there any way that I can get the speed without the memory leak? Thanks!