Questions about memory freeing local variables, gradient information

I am working on a project for which GPU memory is the bottleneck, and I’m trying to get a better grasp on how memory works for PyTorch. I have two questions:

  1. How is memory allocated and freed for local variables inside a model? If I have a, and a is a very large variable, what happens if I do:

b = a + 1
c = b + 1
use c for something, a and b are never used again

Are a, b, and c all in memory at the same time? Or are a and b immediately freed once they are no longer needed? is it helpful to explicitly del a after the assignment of b?

  1. Does zero_grad make it so that variables no longer retain gradient information? The last response of Best Practices for Maximum GPU utilization seems to imply that if I do

optimizer.zero_grad()
loss = model(x)
optimizer.step()

inside a for loop, the loss variable will keep things in memory until it is assigned to again. Is that really the case? Shouldn’t zero_grad remove any references here?

Thank you for your help.

1 Like

Also, another question: certain operations only change the view of a tensor rather than the underlying storage. What happens when we do tensor operations on those tensors? Are we doing operations on the view, allocating new memory of the same shape as the underlying storage, or allocating new memory of the shape of the view?

i.e. I have a tensor a. I do b = a.repeat(20). b is still using the same underlying storage. Now I do d = torch.cat(b, c). What memory does d occupy?

I still don’t know the answer to these questions, and they seem important for understanding of how to write pytorch code. I looked through an answer on the forum as well as the pytorch documentation (i.e. memory best practices) and didn’t find anything. If someone knows the answer or has a source of where I can find out myself, help would be very very welcome!