Whats the difference between Optimizer.zero_grad() vs nn.Module.zero_grad()

The nn.Module.zero_grad() also sets the gradients to 0 for all parameters.

If you ceated your optimizer like opt = optim.SGD(model.paremeters(), xxx), then opt.zero_grad() and model.zero_grad() will have the same effect.
The distinction is useful for people that have multiple models in the same optimizer, or multiple optimizer for different part of their model.

14 Likes