When we nned to use optimizer.zero_grad() and no Model.zero_grad()?

According to:

model.zero_grad() and optimizer.zero_grad() are the same IF all your model parameters are in that optimizer..

  1. In which situation the model parameters will not be in the optimizer ?
  2. Can you give me an example of this situation ?

As you note, most of the time it should make no difference.

  • You might have more than one model in the same optimizer, GAN’s typically use two separate optimizers, but you might have a feature extractor and two heads that are separate and you train them together. Here the optimizer contains more than just one model’s parameters.
  • Or just part of a model (e.g. finetuning a pretrained model). No need to zero grad for parameters that don’t require gradients and so cannot have one.

But I’d not worry about it too much. In my experience, it is more common to use optimizer.zero_grad(), so I’d probably take that unless you have a reason not to.

Best regards