Why is "accumulating" the default mode of .gradient?

aerinykim · July 30, 2018, 3:59am

Why didn’t you guys just make it overwrite? Is there any specific reason for that?

John_Smith · July 30, 2018, 6:16am

One possible reason I can think of is one variable may contribute to multiple losses and in the backward pass, the gradients should be accumulated from all losses.

viraat · July 30, 2018, 10:02am

Hi Aerin! Nice to see you here

I found this post with an answer by @albanD - Why do we need to set the gradients manually to zero in pytorch?
It explains the decision to accumulate gradients when .backward() is performed. I assume the same argument applies for .gradient().