Hi everyone, I am new to PyTorch. I wanted to know where optimizer.zero_grad should be used. I am not sure whether to use them after every batch or I should use them after every epoch. Please let me know. Thank you
there is not hard rule of when you use it. You should just make sure that the gradients accumulated when you call optimizer.step() is the one you want.
In general, you want to zero_grad() just before the backward.
For more general problems, you can check this thread that discuss this at length: Why do we need to set the gradients manually to zero in pytorch?