In the first example x is a leaf tensor, since it is a tensor that you directly created. By default, gradients are computed only with respect to leaf tensors, so x.grad has actual numbers in it.

In the second example x is not a leaf tensor, since it is not a tensor that you directly created: it is a tensor that you obtained by calling a function (to) on another tensor. By default, gradients are computed only with respect to leaf tensors, so x.grad is None.

Here is some code that may help make this distinction clearer:

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)

You can make PyTorch compute gradients with respect to a non-leaf tensor, by calling retain_grad() on that tensor. Here is the same example as above, with an extra call to y.retain_grad():

x = torch.zeros(10, 300, requires_grad=True)
print(x.grad)
y = x.to(device)
y.retain_grad()
loss = torch.max(y)
loss.backward(retain_graph=True)
print(y.grad)
print(x.grad)