I notice that the Variable class has two similar members .grad and ._grad. What is the difference between them? In the pytorch a3c implementation pytorch-a3c, there is a piece of code:

def ensure_shared_grads(model, shared_model):
for param, shared_param in zip(model.parameters(), shared_model.parameters()):
if shared_param.grad is not None:
return
shared_param._grad = param.grad

Variable.grad is not writable, while Variable._grad is writable. If you want to force the value of the gradient to equal another value, you must modify ._grad and not .grad otherwise, you will have the exception:

AttributeError: attribute 'grad' of 'torch._C._VariableBase' objects is not writable

import torch
import torch.nn as nn
from torch.autograd import Variable
from torch.optim import SGD
x = Variable(torch.zeros(3, 4), requires_grad=True)
a = (3 * x).sum()
oa = SGD([x], momentum=0.0, lr=1.0)
for i in range(100):
oa.zero_grad()
a.backward()
print('-' * 40)
print(x.grad)
x._grad = Variable(torch.ones(x.grad.size()))
print(x.grad)
oa.step()
print(x)

A gradient is defined wrt some operations, it has a precise mathematical signification.

There is no point to manually modify it. You may want to modify the usage of the gradient while, for example, setting the gradient descent (in that case you can do something inheriting from torch.optim.Optimizer)