PyTorch Computation Graph not getting freed

AdityaAS · July 23, 2018, 11:34pm

Hi,

I’m training a neural network in Pytorch and am facing “CUDA Out of Memory issue” and the reason seems to be that the computation graph created by PyTorch is not being freed after optimizer.step() and before the loss for the next batch is calculated. Here are the details:

PyTorch version: 0.4.0 (stable)
GPU: NVIDIA 1080Ti
CUDA Version: 9.0

Model Details:
Model contains two parameters (embeddings) of shapes as follows:
Embedding Param1 = 704990 x dim_size
Embedding Param2 = 957760 x dim_size

Memory details for dim_size=50
CUDA Initialization: 584 MB / 11172 MB
MODEL + CUDA: 904 MB / 11172 MB
FORWARD PASS: 1010MB / 11172 MB
BACKWARD PASS: 1986 MB / 11172 MB
OPTIMIZER STEP: 2354 MB / 11172 MB
After del loss, del batch: 1962 MB / 11172 MB

Memory details for dim_size=300
CUDA Initialization: 584 MB / 11172 MB
MODEL + CUDA: 2490 MB / 11172 MB
FORWARD PASS: 2514MB / 11172 MB
BACKWARD PASS: 8232 MB / 11172 MB
OPTIMIZER STEP: 10428 MB / 11172 MB
After del loss, del batch: 8228 MB / 11172 MB
FORWARD PASS on second batch: CUDA OOM

The problem in the case of dim_size=300 is that as soon as the second batch is loaded and backward is called on the second batch it goes out of memory as 8228 + ~5gb is greater than the GPU RAM.

Currently after loss.backward() and optimizer.step() I execute the following operations to free up memory
del loss
del input_batch
torch.cuda.empty_cache()

Is there anything that can be done to make sure that the computation graph is completely deleted and that before the second batch loads the memory occupied by PyTorch on CUDA RAM be same as just before FORWARD PASS before the first pass?

aplassard · July 23, 2018, 11:51pm

Could you post your code that’s causing this?

AdityaAS · July 24, 2018, 1:20am

# text_input = batch_size x 2000
# entity_input = batch_size x 31 (1 pos, 30 neg samples)
# entity_embedding.shape[1] = 300
# linear W = 300x300, linear Bias = 300
 def forward(self, text_input, entity_input):
		sentence_embedding = self.linear(F.normalize(torch.sum(self.word_embedding(text_input), dim=1), dim =-1))
        denominator = logsumexp(torch.sum(self.entity_embedding(entity_input) * sentence_embedding.unsqueeze(1), 2), 1)
        numerator = torch.sum(self.entity_embedding(entity_input)[:,0,:] * sentence_embedding, 1)
        return torch.sum(denominator - numerator)

AdityaAS · July 24, 2018, 1:22am

Training loop

            words = torch.from_numpy(word_file.root.data[start:start+batch_size, :].astype(int))
	        entities = torch.from_numpy(entity_file.root.data[start:start+batch_size, :].astype(int))
			optimizer.zero_grad()
			words.cuda()
			entities.cuda()

			loss = model(words, entities)	
			loss.backward(retain_graph=False)
			optimizer.step()
			
			del loss
			del words
			del entities
			torch.cuda.empty_cache()	
            batchno = batchno + 1
			start = start + batch_size

ptrblck · July 24, 2018, 8:11am

Your training loop looks a bit strange, as you are not assigning words and entities back, so that they should still be on the CPU.
Could you check the device of these tensors?

AdityaAS · July 24, 2018, 10:04pm

The inputs tensors are in the same gpu that I’m running the model on.

AdityaAS · July 25, 2018, 9:08am

Any advice on how I can definitely make sure that the graph is deleted? torch.cuda.empty_cache() isn’t working in this case.

ptrblck · July 25, 2018, 10:17am

As long as some object holds a reference to the graph it cannot be freed.
Make sure you are not storing the tensors, e.g. loss, in a list etc.