Whats the difference between Optimizer.zero_grad() vs nn.Module.zero_grad()

banikr · October 25, 2019, 6:35pm

which one is used when?

https://pytorch.org/docs/stable/optim.html here it uses Optimizer.zero_grad()

https://github.com/pytorch/examples/blob/master/dcgan/main.py codes here use NetG.zero_grad().

I know that optimizer.zero_grad() makes the existing gradients zero before backpropagating the loss. Then update network parameters. What is nn.Module.zero_grad() used for?

albanD · October 25, 2019, 7:02pm

The nn.Module.zero_grad() also sets the gradients to 0 for all parameters.

If you ceated your optimizer like opt = optim.SGD(model.paremeters(), xxx), then opt.zero_grad() and model.zero_grad() will have the same effect.
The distinction is useful for people that have multiple models in the same optimizer, or multiple optimizer for different part of their model.