Zero_grad placement

I came across some code on github where there was a strange placement of the call to zero_grad and I couldn’t explain why it was there or if there was any difference in placing it a few lines earlier.

I would expect this call to come at the beginning of the for loop before the model makes a prediction. Is there any different effect by putting it where it is in the link?

I think all is right, when zero_grad is executed before backward

so just being anywhere before backward is the important part?

If you code does not do weird stuff where it changes the gradients outside of the backward, you can put it anywhere in between two backward. You could even put it just after the backward if you want :slight_smile: