The most common question on `optimizer.zero_grad()`. Just re-confirming my understanding

ptrblck · May 13, 2019, 8:46am

I would argue it depends on your “workflow” as both approaches yield the same result as others already said.

I personally prefer the first approach due to my mindset of
“new iteration -> new gradients -> get rid of the old ones”.
Otherwise I’ve sometimes forgotten to zero out the gradients.