Zero the gradients via the optimizer or the module

I am currently confused as to where the zeroing of the gradients has to be performed.

is the zero_grad function called on the optimizer or on the module, i.e. the network. I have seen examples of both…

It looks like the .zero_grad() exists in both torch.nn.Module as well as torch.optim.Optimizer. So it will work both the same if all model parameters are passed to the optimizer.

However, there can be some cases where the model parameters are split into two or more gourps and learned via different optimizers. In those cases, calling model.zero_grad() will setthe gradients of all model parameters to zero, whereas optim_a.zero_grad() will clear the gradients of a portion of the parameters.

1 Like

You can use both, but using the optimizer is probably a better practice. This is because you may sometimes be optimizing parameters of more than one model, so, by zeroing in the optimizer, you avoid forgetting zeroing the gradients of one of the models.

1 Like

thank you for the concise answers!