Why didn’t you guys just make it overwrite? Is there any specific reason for that?
One possible reason I can think of is one variable may contribute to multiple losses and in the backward pass, the gradients should be accumulated from all losses.
Hi Aerin! Nice to see you here 
I found this post with an answer by @albanD - Why do we need to set the gradients manually to zero in pytorch?
It explains the decision to accumulate gradients when .backward() is performed. I assume the same argument applies for .gradient().