Standard way to zero gradients?

I am trying:

W.grad.data.fill_(0)

However, this fails the first time, since .grad doesnt exist yet. I’m thinking maybe there should be some method like eg W.zero_grad(), which will always succeed, idempotently.

I think that zero_grad only works for nn.Module and nn.Optimizer, and fills with zeros all the parameters. So if your parameter W is part of a module M, you should directly call:

M.zero_grad()

Hello,

would W.grad = None work for you?

Best regards

Thomas

W.grad = None is not bad. But wont that cause reallocation, therefore a cuda-side sync point?