I came across some code on github where there was a strange placement of the call to zero_grad and I couldn’t explain why it was there or if there was any difference in placing it a few lines earlier.
I would expect this call to come at the beginning of the for loop before the model makes a prediction. Is there any different effect by putting it where it is in the link?
If you code does not do weird stuff where it changes the gradients outside of the backward, you can put it anywhere in between two backward. You could even put it just after the backward if you want