Difference between .grad and ._grad

I notice that the Variable class has two similar members .grad and ._grad. What is the difference between them? In the pytorch a3c implementation pytorch-a3c, there is a piece of code:

def ensure_shared_grads(model, shared_model):
  for param, shared_param in zip(model.parameters(), shared_model.parameters()):
      if shared_param.grad is not None:
      shared_param._grad = param.grad  

Why update member ._grad of the shared_model?


Variable.grad is not writable, while Variable._grad is writable. If you want to force the value of the gradient to equal another value, you must modify ._grad and not .grad otherwise, you will have the exception:

AttributeError: attribute 'grad' of 'torch._C._VariableBase' objects is not writable

I just checked this using a simple script

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.optim import SGD

x = Variable(torch.zeros(3, 4), requires_grad=True)

a = (3 * x).sum()

oa = SGD([x], momentum=0.0, lr=1.0)

for i in range(100):
    print('-' * 40)
    x._grad = Variable(torch.ones(x.grad.size()))

However this behavior seems undocumented…

1 Like

However this solution seems undocumented

I sense a solution for this :slight_smile: .

As a bonus, you then get the snazzy ‘contributor’ annotation after your name, in anything you post into the Issues page:

1 Like

it’s undocumented because it’s not meant to be used by the user. Generally all stuff starting with _ is treated as internal stuff.


If we want to manually set the gradient of a variable, would you recommend using ._grad ?

1 Like

I think soumith is pretty clear about this:

It is not meant to be used by the user.

So if you wanted to manually set the gradient of a variable, what would be the correct command?

A gradient is defined wrt some operations, it has a precise mathematical signification.

There is no point to manually modify it. You may want to modify the usage of the gradient while, for example, setting the gradient descent (in that case you can do something inheriting from torch.optim.Optimizer)

what about the case in my example on the top?

if you want to modify the gradient, use hooks. this tutorial in introduces hooks: http://pytorch.org/tutorials/beginner/former_torchies/nn_tutorial.html#forward-and-backward-function-hooks

so, our user should forget the existence of ‘._grad’, just use ‘.grad’ ?

what do you mean?
if you do not specify the parameter “retain_graph” to True, isn’t the loss only can be backward one time?