Standard way to zero gradients?

I am trying:

However, this fails the first time, since .grad doesnt exist yet. I’m thinking maybe there should be some method like eg W.zero_grad(), which will always succeed, idempotently.

I think that zero_grad only works for nn.Module and nn.Optimizer, and fills with zeros all the parameters. So if your parameter W is part of a module M, you should directly call:



would W.grad = None work for you?

Best regards


W.grad = None is not bad. But wont that cause reallocation, therefore a cuda-side sync point?