I know that optimizer.zero_grad() makes the existing gradients zero before backpropagating the loss. Then update network parameters. What is nn.Module.zero_grad() used for?
The nn.Module.zero_grad() also sets the gradients to 0 for all parameters.
If you ceated your optimizer like opt = optim.SGD(model.paremeters(), xxx), then opt.zero_grad() and model.zero_grad() will have the same effect.
The distinction is useful for people that have multiple models in the same optimizer, or multiple optimizer for different part of their model.