Regarding optimizer.zero_grad

albanD · June 18, 2020, 3:36pm

Hi,

there is not hard rule of when you use it. You should just make sure that the gradients accumulated when you call optimizer.step() is the one you want.
In general, you want to zero_grad() just before the backward.
For more general problems, you can check this thread that discuss this at length: Why do we need to set the gradients manually to zero in pytorch?