Aren't weights of models non_leaves. So how do they have their grad values?

SiddharthSingi · October 23, 2020, 2:46pm

I know its obvious why weights need to have gradient values, but I have a particular question about leaves and non leaves. Please read on, this question is not as long as it seems

So I understand that the grad is not available for any intermediate node, only available for leaf nodes (However the reasoning is still not perfectly clear to me like here)

Example of no grad value for intermediate node: (feel free to skip directly to the question below)

a = torch.rand(10, requires_grad=True)
b = torch.rand(10)
c  = a + b

d = torch.tensor(5.0)
e = c*d

e.mean().backward()
print(a.grad)
print(c.grad)

Here a.grad should give me output since it is a leaf node, however c.grad is an intermediate node and shouldnt give me any output, which is exactly what we get:

Output:

tensor([0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
        0.5000])
C:\Anaconda\envs\robot\lib\site-packages\torch\tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by 
mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
  warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "
None

.

MY QUESTION is: In any model that we create, with multiple multiple layers, arent the weights of each layers non_leaves, hence how come they have grad values but other non_leaves do not.

For example:

class ActorNet(nn.Module):
    def __init__(self, obs_space, action_space):
        super().__init__()
        self.fc1 = nn.Linear(obs_space, 32)
        self.fc2 = nn.Linear(32, action_space)        

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x

Here isnt the layer fc2 a non leaf. However its weights will still definitely have a grad value.

KFrank · October 23, 2020, 6:14pm

Hi Siddharth!

On the contrary, fc2.weight (and fc2.bias) are leaf tensors.

In the forward pass the values of fc2.weight do not depend on
the values of x (nor on the values in fc1). They only change (in
the conventional use case) when you call optimizer.step().

Are you imagining that only the first layer is a leaf, that is that in:

    def forward(self, x):
        # x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x

fc2 becomes a leaf because we commented out fc1?

To reiterate, the values of x of course depend upon whether fc1
was called, but the values of fc2.weight do not.

Best.

K. Frank

SiddharthSingi · October 25, 2020, 7:24am

@KFrank
Thanks for the reply. It’s now such an obvious reason as to why the weights are leaf tensors and not non leaves. I was imagining the graph completely wrong in my head. Thanks for helping out.