I am working on a project for which GPU memory is the bottleneck, and I’m trying to get a better grasp on how memory works for PyTorch. I have two questions:
- How is memory allocated and freed for local variables inside a model? If I have a, and a is a very large variable, what happens if I do:
b = a + 1
c = b + 1
use c for something, a and b are never used again
Are a, b, and c all in memory at the same time? Or are a and b immediately freed once they are no longer needed? is it helpful to explicitly del a after the assignment of b?
- Does zero_grad make it so that variables no longer retain gradient information? The last response of Best Practices for Maximum GPU utilization seems to imply that if I do
optimizer.zero_grad()
loss = model(x)
optimizer.step()
inside a for loop, the loss variable will keep things in memory until it is assigned to again. Is that really the case? Shouldn’t zero_grad remove any references here?
Thank you for your help.