What is the performance between between the following expressions?

a = torch.tensor(3, dtype=float32, dtype=‘cuda’)
b = torch.tensor(2, dtype=int32, dtype=‘cuda’)

What is the performance difference between b.float() * a and 2*a?

I am not familiar with what is going under the hood to be able to judge. Without understanding much, I would think that minimizing cpu to gpu conversion will be the fastest and so b.float() * a should be faster?

Or is the conversion cost negligible such that I can just keep switching?

2*a will probably be faster as 2 is passed as an argument to the kernel.

but either way both cases should be fast