What is the difference between opt.zero_grad() and model.zero_grad()?

Hi everyone,

We have a question. What is the difference between optimizer.zero_grad() and model.zero_grad()? Do they need to be defined together when programming?

Thanks.

While creating an optimizer, you could pass parameters to it.
In a lot of cases, you would just pass all model parameters to a single optimizer, so both calls will yield the same result (zeroing out the gradients of all parameters).

However, you could also pass the first half of the model parameters to optimizer1 and the second to optimizer2.
In this use case, model.zero_grad() would zero out the gradients of all model parameters, while optimizerX.zero_grad() would only zero out the gradients of its passed parameters.

1 Like