Why do we need to set the gradients manually to zero in pytorch?

A more explicit example in a similar direction as @ruotianluo is the ability to add gradients from several forward passes, for example in GANs:

If you wanted, you could also achieve minibatches that are larger than fit in your memory by combining several sub-minibatches into one gradient step, but I have not really seen that done.

Best regards

Thomas

7 Likes