Model.zero_grad() or optimizer.zero_grad()?

Hi everyone,

I have confusion when to use model.zero_grad() and optimizer.zero_grad()? I have seen some examples they are using model.zero_grad() in some examples and optimizer.zero_grad() in some other example. Is there any specific case for using any one of these?

17 Likes

I am training a network on speech data.

If you’re referring to:

optimizer = optim.SGD(net.parameters())

They’re the same.

1 Like

I am using :
optimizer = optim.Adam(model.parameters())

In this case what should I use?

Same; whether you use SGD, Adam, RMSProp etc.

Typically I use optimizer.zero_grad().

model.zero_grad() and optimizer.zero_grad() are the same IF all your model parameters are in that optimizer. I found it is safer to call model.zero_grad() to make sure all grads are zero, e.g. if you have two or more optimizers for one model.

25 Likes