Strange behavior of Variable.cuda() and Variable.grad

The reason you see this is because a in the first code snippet is a non-leaf Variable (i.e. not user-created, but is the result of an operation)

We do not store gradients of non-leaf Variables, they have to be accessed by hooks.

In the second snippet, a is a leaf Variable, so gradients are correctly populated.

From pytorch version 0.1.10 onwards, the gradients of non-leaf Variables are actually None, so hopefully that’s the clarification / better behavior you would want.

>>> print a.grad
None
5 Likes