But i cannot understand the appearance of 2 of Case 1â€¦ In fact, I think that the difference is : for 2 of Case 1, i set requires_grad = True when creation. for Case 2, i set requires_grad = True after creation, why these two cases are not same?

The ordering of require_grad and the cuda() call matters. After requires_grad = True, any operation (such as â€śx.cuda()â€ť or â€śx * 2â€ť is recorded and treated as a differentiable operation.

Hereâ€™s another example to make this more clear. You could substitute .cuda() for * 2.

>>> x = torch.tensor([1.0], requires_grad=True)
>>> y = x * 2 # is_leaf = False
>>> print(y)
tensor([2.], grad_fn=<MulBackward0>)

>>> x = torch.tensor([1.0])
>>> y = x * 2
>>> print(y)
tensor([2.])
>>> y.requires_grad = True # is_leaf = True
>>> print(y)
tensor([2.], requires_grad=True)

In the first case, y is treated as a function of x since x.requires_grad=True. You can see it has a grad_fn so is_leaf is False

In the second case, y is treated as independent of x because x did not have requires_grad=True. Because itâ€™s independent of x, it doesnâ€™t have a grad_fn so is_leaf is True.