tensor.is_leaf == True
tensor.requires_grad == True.
- tensor.grad_fn is None; if it is not None, you need to
- gradient computation is not disabled using
- you are not running any non-differentiable operation.
By default, Autograd populates gradients for a tensor
t.grad only when
t.is_leaf == True and
t.requires_grad == True.
What is a leaf tensor?
Leaf tensors are tensors at the beginning of the computational graph, which means they are not the outputs of any differentiable operation. A model’s weights and biases, as well as any inputs to it, are all leaf tensors.
Outputs of hidden layers (activations) are not leaf tensors, because they are the result of a differentiable op (eg:
matmul()). You can see the operation that generated a tensor in
Read more about leaf tensors: What is the purpose of `is_leaf`?
But I need the gradients of intermediate outputs!
If a tensor is created from an operation that’s “differentiable” by Autograd - including operations like
.to() which don’t look differentiable - it is not a leaf tensor and will not have gradients accumulated by default.
You can explicitly instruct Autograd to accumulate gradients for tensors by calling
tensor.retain_grad() before calling
.backward(). See this thread for an example: Method grad returns None for a tensor
!! Gotcha - Avoid using
nn.Parameters as they will be deregistered from the model
You will end up overwriting your leaf
nn.Parameter with a non-leaf tensor.
class MyModel(nn.Module): def __init__(self): super(MyModel, self).__init__() self.param = nn.Parameter(torch.randn(1)).to(torch.float64) model = MyModel() print(dict(model.named_parameters())) # empty
The output of non-differentiable operations will have
requires_grad=False even if the inputs have
requires_grad=True. Gradients cannot be computed for this operation, and you will see the error
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn. See this thread for an example: Custom loss function: gradients are None