Should we perform optimizer.zero_grad() before we do forward pass or after that?

as titled. should I call zero grad before I do forward pass? Or I just need to make sure it is performed before I call loss.backward()?

The forward pass doesn’t care about the grads, nor does it modify them.
You only need to zero the grads before calling .backward().


Thanks! It helps a lot