Should I send scalar tensor to GPU?

Hello,

Please read the two implementations, which one is preferred for GPU?
general question: should I send scalar to GPU ?

class Net1(nn.Module):
    def __init__(self):
        super().__init__()
        self.register_buffer('tau', torch.tensor(0.5))
        
    def forward(self, x):
        return self.tau*x
#------------------------------------------------------------------
class Net2(nn.Module):
    def __init__(self):
        super().__init__()
        self.tau=0.5
        
    def forward(self, x):
        return self.tau*x
#------------------------------------------------------------------
model=Net1()
model.to('cuda')
#or
model=Net2()
model.to('cuda')
# will self.tau be sent to GPU 'automatically' and then do tau*x ?

When you move model to GPU then all the computation happens on the GPU. So self.tau is also moved to GPU automatically.

self.tau=0.5 will be moved to GPU automatically?

IIRC, "Scalar"s are handled in specialized ops in c++, so they probably just end up as arguments to cuda kernel functions. And cuda automatically copies kernel arguments (pointers & scalars) to gpu.
So maybe scalars are marginally faster than buffers, not sure.

Consider this. When you move your model to GPU, all rhe weights of the model are moved to GPU. Now when you have to do some computation on these weights that has to be done on the GPU because GPU-CPU-GPU transfers are expensive. So all the code of your model in turn runs on rhe GPU.