I know this is a very basic question, but it’s my first day with pytorch and I can’t seem to figure it out? What is the difference between no_grad() and requires_grad, and when to use each of them, and when/how to mix them?
with torch.no_grad() is a context manager and is used to prevent calculating gradients in the following code block.
Usually it is used when you evaluate your model and don’t need to call
backward() to calculate the gradients and update the corresponding parameters.
Also, you can use it to initialize the weights with
torch.nn.init functions, since you don’t need the gradients there either.
requires_grad on the other hand is used when creating a
tensor, which should require gradients. Usually you don’t need this in the beginning, as all parameters which require gradients are already wrapped in
nn.Modules in the
You could set this property e.g. on your input tensor, if you need to update your input for example in an adversarial training setup.