which one is used when?
https://pytorch.org/docs/stable/optim.html here it uses Optimizer.zero_grad()
https://github.com/pytorch/examples/blob/master/dcgan/main.py codes here use NetG.zero_grad().
I know that optimizer.zero_grad() makes the existing gradients zero before backpropagating the loss. Then update network parameters. What is nn.Module.zero_grad() used for?
The nn.Module.zero_grad() also sets the gradients to 0 for all parameters.
If you ceated your optimizer like
opt = optim.SGD(model.paremeters(), xxx), then
model.zero_grad() will have the same effect.
The distinction is useful for people that have multiple models in the same optimizer, or multiple optimizer for different part of their model.