Expand() memory savings

I am fairly new to PyTorch so sorry for a newbie question.
How does PyTorch handles calling .to(device) when the object this methods is called on is created by calling expand(...) on a tensor? Does it create an underlying (small/minimal) tensor on a device and then a view or copies the view as it was a tensor, resulting in allocating much more device’s memory?
If the former is the case, does it copy just the tensor data in the given view or the whole underlying tensor the view points to?
If the former is the case, it would make sense to use expand after the tensor is copued to device right?

import torch
x = torch.randn(3,2,1)

# (2, 1, 1)

y = x.expand(-1, -1, 4)
# (2, 1, 0)

x_gpu = x.to(0)
# (2, 1, 1)

y_gpu = y.to(0)
# (8, 4, 1)

I guess when calling to() method, torch just create a contiguous tensor on the new device, which would probably consuming more memory than the original view.