Autograd: `require_grad=True`

andrew_su · July 7, 2018, 7:29pm

Hello

I am very very confused about the gradients when we do back-propagation. According to the Pytorch’s Autograd webpage, it says that if we want a tensor/variable to contain gradient information, we need to set require_grad=True. I have gone through many examples on the web, for instance: reinforce algorithm example , or policy gradient example, they didn’t set require_grad=True.

I am wondering, in that case, when they do loss.backward(), and they didn’t even specify a single instance that require_grad=True, what exactly is happening? Do they event do back-propagation?

Naruto-Sasuke · July 7, 2018, 11:54pm

Hi，we do not need the gradient of input(In most cases, they are useless, unless some special works like neural style transfer, where we only iteratively change the input to optimize the total loss). Usually, we only want to get the model trained. Parameters in each layer are default to be requires_grad=True.
So there is no worry about it.

andrew_su · July 9, 2018, 12:09am

Thanks Naruto for your input. Are you saying that, if we are building network in the following fashion, we don’t need to worry about the setting the any tensor to have require_grad = True?
I am very confused because I was reading the tutorial from here: autograd tutorial, and it was emphasizing about the require_grad= True, where as in the example below, we don’t care about this.

So in short, my question is, we only use require_grad=True when we are building network from scratch (like the pytorch tutorial link). If we are building network in the fashion given by below’s code, we don’t need to worry about this?

class Policy(nn.Module):
    def __init__(self):
        super(Policy, self).__init__()
        self.affine1 = nn.Linear(4, 128)
        self.affine2 = nn.Linear(128, 2)

        self.saved_log_probs = []
        self.rewards = []

    def forward(self, x):
        x = F.relu(self.affine1(x))
        action_scores = self.affine2(x)
        return F.softmax(action_scores, dim=1)


policy = Policy()
optimizer = optim.Adam(policy.parameters(), lr=1e-2)
eps = np.finfo(np.float32).eps.item()

Naruto-Sasuke · July 9, 2018, 2:39am

This is a snippet of Linear layer, as you can see. The learnable weights are registered as Parameter which is default as requires_grad=True, see here. Input of the networks needs no gradient(They are useless in most cases). So everything is fine.

    def __init__(self, in_features, out_features, bias=True):
        super(Linear, self).__init__()
        self.in_features = in_features
        self.out_features = out_features
        self.weight = Parameter(torch.Tensor(out_features, in_features))
        if bias:
            self.bias = Parameter(torch.Tensor(out_features))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

andrew_su · July 9, 2018, 4:45am

Thanks Naruto. That clarifies things a lot for me.

One more question, I see many examples just Autograd.Variable, but now, Variable is being deprecated, do you see the need of using variable?

Naruto-Sasuke · July 10, 2018, 6:29am

I cannot think up one case, it is beyond me…

erm · December 6, 2019, 5:43am

wow, i had the same questions after following tutorials, thanks for clarifying. I was scratching my head thinking what was my NN network doing with no requires_grad= True anywhere.