Detach, no_grad and requires_grad

albanD · April 21, 2021, 7:57pm

The returned Tensors will not.
The original Tensor you called detach() on remains unchanged.

srishti-git1110 · July 25, 2022, 7:01am

Hi @albanD , could you please help me understand why torch,no_grad shall consume lesser memory?

I understand every output operation inside the torch,no_grad block shall have requires_grad=False no matter what the inputs’ requires_grad looks like.
This essentially means no tensors will be saved intermediately which saves memory; but how is detach() different in this sense?

A detached tensor also has its requires_grad=False.
This also means no gradients need to be backpropagated so no saving of intermediate tensors.

What am I missing? I’m also thinking memory consumption comparison between the two shall depend on the specific code at hand, or is there any general comparison as well?

albanD · July 25, 2022, 10:25am

They are the same really. Just that if you use .detach(), you have to do that for every op while if you use the context manager, you can disable it for the whole bloc.
So you should use one or the other depending on what is most convenient for your particular use case.