The returned Tensors will not.
The original Tensor you called detach() on remains unchanged.
Hi @albanD , could you please help me understand why torch,no_grad
shall consume lesser memory?
I understand every output operation inside the torch,no_grad
block shall have requires_grad=False
no matter what the inputs’ requires_grad
looks like.
This essentially means no tensors will be saved intermediately which saves memory; but how is detach()
different in this sense?
A detached tensor also has its requires_grad=False
.
This also means no gradients need to be backpropagated so no saving of intermediate tensors.
What am I missing? I’m also thinking memory consumption comparison between the two shall depend on the specific code at hand, or is there any general comparison as well?
They are the same really. Just that if you use .detach()
, you have to do that for every op while if you use the context manager, you can disable it for the whole bloc.
So you should use one or the other depending on what is most convenient for your particular use case.