a = torch.tensor(3, dtype=float32, dtype=‘cuda’)
b = torch.tensor(2, dtype=int32, dtype=‘cuda’)
What is the performance difference between b.float() * a and 2*a?
I am not familiar with what is going under the hood to be able to judge. Without understanding much, I would think that minimizing cpu to gpu conversion will be the fastest and so b.float() * a should be faster?
Or is the conversion cost negligible such that I can just keep switching?