Freeing extra memory from intermediate tensors after training

Matt_Gorbett · February 19, 2023, 7:29pm

I’m running a memory report prior to training and after training a single epoch and noticed extra Tensors are being allocated to memory. Is there any way to get rid of these? Saving the optimizer and deleting the version in mem helps some, but I cant get rid of these extra tensors that get created as a result of training. eval() mode doesn’t do it.

I am running the memory report from this link: A simple Pytorch memory usages profiler · GitHub

mem_report()
for epoch in range(1, self.args.epochs + 1):
    epoch_loss = self.train()
    self.train_loss = epoch_loss
    test_loss, test_acc = self.test()
    ...
    mem_report()

Before training:

Element type	Size			Used MEM(MBytes)
Storage on GPU
-----------------------------------------------------------------
Parameter		(1024, 784)		3.06
Parameter		(10, 1024)		0.04
-----------------------------------------------------------------
Total Tensors: 813056 	Used Memory Space: 3.10 MBytes

After training one epoch:

Element type	Size			Used MEM(MBytes)
Storage on GPU
-----------------------------------------------------------------
Tensor		(1024, 784)		3.06
Tensor		(1024, 784)		3.06
Tensor		(1024, 784)		3.06
Tensor		(10, 1024)		0.04
Tensor		(10, 1024)		0.04
Tensor		(10, 1024)		0.04
Parameter		(10, 1024)		0.04
Parameter		(1024, 784)		3.06
-----------------------------------------------------------------
Total Tensors: 3252224 	Used Memory Space: 12.41 MBytes

ptrblck · February 19, 2023, 9:55pm

I guess the additional tensors are allocated to store the intermediate forward activations which are needed for the gradient computation. If you don’t want to train your model you can disable it by wrapping the forward pass into a torch.no_grad() or torch.inference_mode() guard.

Matt_Gorbett · February 19, 2023, 11:55pm

Is there any way to free these after training? In the second call to mem_report above the model is in eval mode, so I don’t need these extra tensors then.
I’m iteratively training many models in a loop and I’m running into memory issues when I try to train a subsequent model because the first model is holding onto these extra tensors. Additionally I’ll need to train each model several times.

ptrblck · February 20, 2023, 1:08am

That’s not the case, since the model.train() and .eval() modes do not influence the gradient calculation, but change the behavior of certain layers (e.g. dropout will be disabled during model.eval()). To disable gradient calculation use the aforementioned guards.

Matt_Gorbett · February 20, 2023, 2:29am

Thanks a bunch for your help. These tensors are still showing up in the report. The test method does forward passes on the model. Same behavior with inference_mode().

with torch.no_grad():
    test_loss, test_acc = self.test()
    mem_report()
    print(torch.sum(self.model.fc1.weight.grad.abs())) #This is >0

The grad is one of the tensors in my list that I’m looking to free. This above snippet is after the train code where new tensors get allocated. Is there any way to automatically free these after training without manually setting each to None?

ptrblck · February 20, 2023, 9:14am

The intermediate activations which were created during the training will be freed in the backward operation. Gradients (i.e. the .grad attributes of parameters) can be freed by calling .zero_grad(set_to_none=True) on the model or optimizer.

Matt_Gorbett · February 20, 2023, 4:52pm

Deleting the optimizer gets rid of a few of the intermediate tensors in memory, however the weight.grads still exists unless you run .zero_grad(set_to_none=True) after a training loop. In my case I need to explicitly do both. Thanks for the help @ptrblck