Memory Usage/Leak

I started using PyTorch recently, and I’m very impressed so far; thank you for making it! I installed PyTorch with the pip install, and am using Python 2.7.6

Original Post, Disregard

I noticed steadily increasing memory usage during training a CNN. Here is my training step:

def training_step(self, trainingdata, traininglabels):
    self.optimizer.zero_grad()  # zero the gradient buffers
    prepared_data = self.prepare_data(trainingdata, (len(trainingdata), self.channels_in, self.height, self.width))
    indices = np.nonzero(traininglabels)[1] #turn one-hot labels into indices (how Pytorch does it)
    prepared_indicies = self.prepare_data(indices, np.shape(indices), dtype=np.int64) #prepare to go into GPU (and do int64 to get longTensor, which is how Pytorch wants the labels)
    output = self(prepared_data).cuda()
    error = self.loss(output, prepared_indicies).cuda()
    error.backward()
    self.optimizer.step()  # Does the update

def prepare_data(self, data, shape, dtype = np.float32): #prepare train or val/test data for a pass through the network
    reshaped = data.reshape(shape).astype(dtype) #correct shape, correct dtype
    reshaped = torch.from_numpy(reshaped) #make torch tensor
    return Variable(reshaped.cuda()) #put that in torch var

If I comment out

    output = self(prepared_data).cuda()
    error = self.loss(output, prepared_indicies).cuda()
    error.backward()
    self.optimizer.step()  # Does the update

the leak does not occur. Therefore, the leak occurs in those lines.

if I add, as suggested by Tracking down a suspected memory leak

torch.backends.cudnn.enabled = False

performance is degraded by approximately 40%, but the leak is less severe (approximately 50MB/epoch instead of 200MB). If I then add

    del output
    del error

To the end of my training step, the memory leak seems to go almost entirely! (approx 3MB/epoch). However, if I add the del lines, but not the enabled = False line, the leak is still present.

Is this a bug or am I missing some sort of clean-up step in my code? Am I supposed to use the del lines? Is there some kind of work-around that would allow me to get the better performace (present without the enabled = False line) without the memory leak?


Never mind, disregard the above, I recorded some of my tests incorrectly,

torch.backends.cudnn.enabled = False

does completely solve the leak by itself. However, it is about 40% slower. Is there any way that I can get the speed without the memory leak? Thanks!

The bidirectional LSTM leak, which you linked, seemed to be happening to me for cudnn6 but not 5. My understanding was that it was not so much cudnn code that did it, but something in the build environment or so (not that the result is terribly different for us).
Unfortunately, something was up with my post in the NVidia forums back then (it never appeared there publicly even though I could see it when logged in) and have not revisited it.
I don’t know if similar things apply to CNNs.

Best regards

Thomas