I know its obvious why weights need to have gradient values, but I have a particular question about leaves and non leaves. Please read on, this question is not as long as it seems
So I understand that the grad is not available for any intermediate node, only available for leaf nodes (However the reasoning is still not perfectly clear to me like here)
Example of no grad value for intermediate node: (feel free to skip directly to the question below)
a = torch.rand(10, requires_grad=True)
b = torch.rand(10)
c = a + b
d = torch.tensor(5.0)
e = c*d
e.mean().backward()
print(a.grad)
print(c.grad)
Here a.grad should give me output since it is a leaf node, however c.grad is an intermediate node and shouldnt give me any output, which is exactly what we get:
Output:
tensor([0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000])
C:\Anaconda\envs\robot\lib\site-packages\torch\tensor.py:746: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by
mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more informations.
warnings.warn("The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad "
None
.
MY QUESTION is: In any model that we create, with multiple multiple layers, arent the weights of each layers non_leaves, hence how come they have grad values but other non_leaves do not.
For example:
class ActorNet(nn.Module):
def __init__(self, obs_space, action_space):
super().__init__()
self.fc1 = nn.Linear(obs_space, 32)
self.fc2 = nn.Linear(32, action_space)
def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return x
Here isnt the layer fc2 a non leaf. However its weights will still definitely have a grad value.