Does performance drop due to default adding of parameters gradients?

Just a question out of curiosity, in pytorch the current gradients of the parameters is automatically added to the previous ones, therefore we have to set all the gradients to zero after each update.
So is it true that by doing that it does cause performance regression since we are doing two unwanted stuff repeatedly - 1. Set the previous gradients to zero and - 2. Then add the current gradients to the previous gradients.
Also is there a way to stop the default addition of gradients.

The thing is that you need somewhere to store the gradients. And in modern systems, adding into an existing buffer is much faster than allocating a new one then copying the values into it.
So this is not really a perf regression: re-using memory is actually a better way to do this.

If you really want to not do that, you can set the .grad fields to None instead of zeroing them out and a new Tensor to store the gradients will be set back every time.

So that’s what it is, thanks for your time.