Standard way to zero gradients?

hughperkins · July 7, 2017, 9:39am

I am trying:

W.grad.data.fill_(0)

However, this fails the first time, since .grad doesnt exist yet. I’m thinking maybe there should be some method like eg W.zero_grad(), which will always succeed, idempotently.

alexis-jacq · July 7, 2017, 9:51am

I think that zero_grad only works for nn.Module and nn.Optimizer, and fills with zeros all the parameters. So if your parameter W is part of a module M, you should directly call:

M.zero_grad()

tom · July 7, 2017, 6:55pm

Hello,

would W.grad = None work for you?

Best regards

Thomas

hughperkins · July 7, 2017, 10:27pm

W.grad = None is not bad. But wont that cause reallocation, therefore a cuda-side sync point?