But i cannot understand the appearance of 2 of Case 1… In fact, I think that the difference is : for 2 of Case 1, i set requires_grad = True when creation. for Case 2, i set requires_grad = True after creation, why these two cases are not same?
The ordering of require_grad and the cuda() call matters. After requires_grad = True, any operation (such as “x.cuda()” or “x * 2” is recorded and treated as a differentiable operation.
Here’s another example to make this more clear. You could substitute .cuda() for * 2.
>>> x = torch.tensor([1.0], requires_grad=True)
>>> y = x * 2 # is_leaf = False
>>> print(y)
tensor([2.], grad_fn=<MulBackward0>)
>>> x = torch.tensor([1.0])
>>> y = x * 2
>>> print(y)
tensor([2.])
>>> y.requires_grad = True # is_leaf = True
>>> print(y)
tensor([2.], requires_grad=True)
In the first case, y is treated as a function of x since x.requires_grad=True. You can see it has a grad_fn so is_leaf is False
In the second case, y is treated as independent of x because x did not have requires_grad=True. Because it’s independent of x, it doesn’t have a grad_fn so is_leaf is True.