Do i need to do optimizer.zero_grad() when using Adam solver?

Do i need to do optimizer.zero_grad() when using Adam solver?

Related: are model.zero_grad() and optimizer.zero_grad() equivalent when using an optimizer?

@Nick_Young yes, the buffer for the gradient are never zeroed out automatically.
@lgelderloos only if you created your optimizer as optimizer = optim.some_optim_func(model.parameters(), ...). Basically model.zero_grad() will zero all the parameters in the model. optimizer.zero_grad() will zero out all parameters associated with this optimizer. Depending on how you created the optimizer, they will be the same or not.

5 Likes

Thanks for the clarification!

Thank you! @albanD :thumbsup: