# The most common question on `optimizer.zero_grad()`. Just re-confirming my understanding

Is it

``````optimizer.zero_grad()
out = model(batch_X)
loss = loss_fn(out, batch_y)
loss.backward()
optimizer.step()
``````

or

``````out = model(batch_X)
loss = loss_fn(out, batch_y)
loss.backward()
optimizer.step()
``````

I use the second one, as I belive just before the gradients are calculated, I need to zero out the existing ones.

But, my project reviewer on Udacity says to use the first one.
The model is LSTM.

Also, I think both are the same, isn’t it?

1 Like

Hi bro,

I also think they are the same.

I have read a thread about correct order in this forums (I cannot find it now). From developers’ comment, they recommeded to use the first way you mentioned above.

1 Like

Thanks a ton man. Also, in case you stumble upon that thread. Just let me know here
Thanks

Sorry for misremembering it.

In this comment, he just recommended calling `optimizer.zero_grad()` before `.backward()`.

I also agree that both should be the same, and this makes sense, as the gradients are only computed when `backward()` is called…

I would argue it depends on your “workflow” as both approaches yield the same result as others already said.

I personally prefer the first approach due to my mindset of
“new iteration -> new gradients -> get rid of the old ones”.
Otherwise I’ve sometimes forgotten to zero out the gradients.

8 Likes