The most common question on `optimizer.zero_grad()`. Just re-confirming my understanding

I would argue it depends on your “workflow” as both approaches yield the same result as others already said.

I personally prefer the first approach due to my mindset of
“new iteration -> new gradients -> get rid of the old ones”.
Otherwise I’ve sometimes forgotten to zero out the gradients. :wink:

9 Likes