The nn.Module.zero_grad() also sets the gradients to 0 for all parameters.
If you ceated your optimizer like opt = optim.SGD(model.paremeters(), xxx)
, then opt.zero_grad()
and model.zero_grad()
will have the same effect.
The distinction is useful for people that have multiple models in the same optimizer, or multiple optimizer for different part of their model.