I am trying:
W.grad.data.fill_(0)
However, this fails the first time, since .grad
doesnt exist yet. I’m thinking maybe there should be some method like eg W.zero_grad()
, which will always succeed, idempotently.
I am trying:
W.grad.data.fill_(0)
However, this fails the first time, since .grad
doesnt exist yet. I’m thinking maybe there should be some method like eg W.zero_grad()
, which will always succeed, idempotently.
I think that zero_grad only works for nn.Module
and nn.Optimizer
, and fills with zeros all the parameters. So if your parameter W is part of a module M, you should directly call:
M.zero_grad()
Hello,
would W.grad = None
work for you?
Best regards
Thomas
W.grad = None is not bad. But wont that cause reallocation, therefore a cuda-side sync point?