Zero the gradients via the optimizer or the module

Carsten_Ditzel · January 14, 2019, 2:50pm

I am currently confused as to where the zeroing of the gradients has to be performed.

is the zero_grad function called on the optimizer or on the module, i.e. the network. I have seen examples of both…

vmirly1 · January 14, 2019, 3:42pm

It looks like the .zero_grad() exists in both torch.nn.Module as well as torch.optim.Optimizer. So it will work both the same if all model parameters are passed to the optimizer.

However, there can be some cases where the model parameters are split into two or more gourps and learned via different optimizers. In those cases, calling model.zero_grad() will setthe gradients of all model parameters to zero, whereas optim_a.zero_grad() will clear the gradients of a portion of the parameters.

dpernes · January 14, 2019, 3:42pm

You can use both, but using the optimizer is probably a better practice. This is because you may sometimes be optimizing parameters of more than one model, so, by zeroing in the optimizer, you avoid forgetting zeroing the gradients of one of the models.

Carsten_Ditzel · January 14, 2019, 3:43pm

thank you for the concise answers!