I am currently working on PEFT memory management, specifically with the LoRA task. When I wrap a model using the Hugging Face PEFT library, it essentially freezes the backbone model’s parameters by setting requires_grad=False.
My first question is: Does PyTorch’s memory management strategy release activation tensors that are not used during the backward process? By “not used,” I mean those activation tensors that would have been used to calculate gradients for the backbone model’s weights, which are now frozen.
If PyTorch retains these activation tensors in memory, how can I manually prune the unnecessary ones? Is it possible to set them to None directly?
No additional memory should be used for the backward pass if requires_grad is False. From the Autograd documentation:
During the forward pass, an operation is only recorded in the backward graph if at least one of its input tensors require grad. During the backward pass (.backward() ), only leaf tensors with requires_grad=True will have gradients accumulated into their .grad fields.
Also, if you ever want to manually relinquish memory of the graph from a tensor, you can detach a tensor with some_tensor = some_tensor.detach().
Hi Brock_Brown!
Thank you for your reply!
Can I understand it this way: when a calculated activation tensor is not used to update a gradient, it will not appear in the computation graph, and thus PyTorch will release the memory for this tensor. In other words, PyTorch internally helps to prune the tensors that are not needed in the backward pass?
And another question: Is this achieved through Python’s garbage collection (gc)?
Yup, it will not appear in the computation graph. Memory is relinquished through the garbage collector whenever there are no references to an object in Python.
Forgot a very important part here, the GPU does not clear its memory unless you have deleted the tensors (or just make sure there are no references to them) and you’ve cleared the cache with torch.cuda.empty_cache().