What is the difference between .grad and ._grad?

In the most popular a3c pytorch implementation, there’s a function ensure_shared_grads that ensure the local and global shared optimizer share gradient.

In line 18, shared_param._grad = param.grad is used to share the gradient of the local optimizer with the global optimizer. However, only shared_param.grad is checked to ensure the gradients are shared in subsequently backward pass.

What’s the difference between .grad and ._grad ?

Thanks!

2 Likes

_grad is an internal variable, that is writable from python. you’ll notice that .grad is read-only

1 Like

so, if I want change the gradient manually in optimizer, must I using the following program?

for group in meta_optim.param_groups:  # meta_optim is optimizer
     for p,g in zip(group['params'], grads):  # grads were calculated manually.
         if p.grad is not None:
             p._grad.data = g.data # rather than p.grad.data=g.data?
meta_optim.step()

Hey – according to these docs Variable is deprecated, and it seems that you can write to the Tensor.grad attribute now. For example: (python 3.7, torch v1.2.0):

import torch

#case 1, an ordinary Tensor
a = torch.randn(1, requires_grad=True)
assert a.grad is None
a.grad = torch.ones_like(a)
a.grad.data = 2* torch.ones_like(a)
print(a.grad)  # prints tensor([2.])

# case 2, parameters in an nn.Module
class MyModel(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(1,1)
    
model = MyModel()
for param in model.parameters():
    assert param.grad is None
    param.grad = torch.ones_like(param.data)
    param.grad.data = 2 * torch.ones_like(param.data)
    print(param.grad) #more 2's, no error