The reason you see this is because a in the first code snippet is a non-leaf Variable (i.e. not user-created, but is the result of an operation)
We do not store gradients of non-leaf Variables, they have to be accessed by hooks.
In the second snippet, a is a leaf Variable, so gradients are correctly populated.
From pytorch version 0.1.10 onwards, the gradients of non-leaf Variables are actually None, so hopefully that’s the clarification / better behavior you would want.
>>> print a.grad
None