Ok this is not a good example because I could just move it to the same tensor as x.
But the problem I face is because I have to define the weights myself
self.w = Parameter(torch.zeros(out_features, in_features))
in __init__ function.
At that point, I still do not know the input device. Therefore my self.w is on the cpu but my input is on the gpu.