In the most popular a3c pytorch implementation, there’s a function
ensure_shared_grads that ensure the local and global shared optimizer share gradient.
In line 18,
shared_param._grad = param.grad is used to share the gradient of the local optimizer with the global optimizer. However, only
shared_param.grad is checked to ensure the gradients are shared in subsequently backward pass.
What’s the difference between
_grad is an internal variable, that is writable from python. you’ll notice that
.grad is read-only
so, if I want change the gradient manually in optimizer, must I using the following program?
for group in meta_optim.param_groups: # meta_optim is optimizer
for p,g in zip(group['params'], grads): # grads were calculated manually.
if p.grad is not None:
p._grad.data = g.data # rather than p.grad.data=g.data?
Hey – according to these docs Variable is deprecated, and it seems that you can write to the Tensor.grad attribute now. For example: (python 3.7, torch v1.2.0):
#case 1, an ordinary Tensor
a = torch.randn(1, requires_grad=True)
assert a.grad is None
a.grad = torch.ones_like(a)
a.grad.data = 2* torch.ones_like(a)
print(a.grad) # prints tensor([2.])
# case 2, parameters in an nn.Module
self.layer = torch.nn.Linear(1,1)
model = MyModel()
for param in model.parameters():
assert param.grad is None
param.grad = torch.ones_like(param.data)
param.grad.data = 2 * torch.ones_like(param.data)
print(param.grad) #more 2's, no error