The reason you see this is because a
in the first code snippet is a non-leaf Variable
(i.e. not user-created, but is the result of an operation)
We do not store gradients of non-leaf Variables, they have to be accessed by hooks.
In the second snippet, a
is a leaf Variable
, so gradients are correctly populated.
From pytorch version 0.1.10
onwards, the gradients of non-leaf Variables
are actually None
, so hopefully that’s the clarification / better behavior you would want.
>>> print a.grad
None