What is the purpose of `is_leaf`?

We need graph leaves to be able to compute gradients of final tensor w.r.t. them. Leaf nodes are not functions or simply put, have not been obtained from mathematical operations. For instance, in a nn.Linear(in, out) module, weight and bias are leaf nodes so when you call .backward on a loss function that uses this linear layer, gradient of loss function will be calculated w.r.t. the weights and bias. In the other words, all parameters of layers are leaf nodes.

Because .cuda() is an operation, then yes, you have use retain_grad=True.

Here is an example that might help:

conv = nn.Conv2d(1, 3, 2)  # its params requires grad, we created it so it is a leaf
print(conv.weight.requires_grad)  # true but its grad is None as it is a leaf
print(conv.weight.is_leaf)  # true
z= conv(torch.randn(1, 1, 10, 10)).sum()  # output of sum math operation so is not leaf but requires grad
print(z.is_leaf)  # false  
z.backward()
print(conv.weight.grad_fn)  # none
print(z.grad_fn)  # <SumBackward0 object>

Although, after reading few posts, some questions raised for me. I think we need to view some presentations about it (let’s google it!)

References:

  1. How do I calculate the gradients of a non-leaf variable w.r.t to a loss function?
  2. What are means of leaf variable and accumulated gradient?
3 Likes