Oh, should have read the docs more in detail. There they say that to() creates a copy, so I assume that in this case, the original tensor that is created on the CPU is the parent of the one that is moved to the GPU.
This works:
test = torch.zeros((10,10)).to(data.device).detach().requires_grad_(True)
print(test.is_leaf) # True