Torch.no_grad() vs detach()

I understand that when using the no_grad() environment, the Autograd does not keep track of the computation graph and it’s similar to temporarily setting requires_grad to False whereas the detach() function returns a tensor which is detached from the computation graph.

My question is, is there any place where using detach() is necessary? It seems to me that we can always do everything using no_grad() instead.


There is quite a bit of fine print to this rough “they have the same effect”. For example:

  • even in no_grad-mode views will be tracked (and have requires_grad set if they are views of a tensor that has).
  • detach is a more versatile operation in that you can control what you want to not have gradients (e.g. if you want to train only the last (few) layer(s) for fine-tuning).