Simple timing question when casting to float on GPU

Hi guys,

How efficient is the following?

rewards = torch.as_tensor(rewards).to(self.device).float()

vs

rewards = torch.as_tensor(rewards).float().to(self.device)

What is the best way to benchmark that given all the randomness with GPU?

Lastly, if a tensor x on the GPU is already a float and then applying another x = x.float() operation - is that efficient? In other words, does PyTorch know it’s already float and will not do the cast? Or does that result in a significant overhead that should be avoided?

Thanks!

1 Like

Hi Muppet!

Speaking as a non-expert, I would think that

rewards = torch.as_tensor (rewards, dtype = torch.float, device = torch.device('cuda'))

would be the way to go. I believe that pytorch will “no-op” any
unneeded casts.

Note, if rewards starts out as, for example, a numpy double array
(and self.device is the gpu), I think converting to float first, and
then moving it to the gpu would win, because bandwidth to the gpu
is a bottleneck, and moving the float would only involve half as
much data as moving the double.

I’m pretty sure that pytorch knows enough to skip redundant casts.
(May the experts correct me if I’m wrong.)

Best.

K. Frank

1 Like

Thanks for the reply. Yeah it would be really interesting if someone can confirm this for sure?

Yes, as @KFrank explained, no copy/transform will be triggered, if the data is already in the desired type.
Also as_tensor will share the data of the underlying numpy array (if you leave it on the CPU):

x = np.zeros((2, 2))
y = torch.as_tensor(x, dtype=torch.float64)
print(x, '\n', y)

y[0, 0] = 1.
print(x, '\n', y)

# Changing the data type will create a copy
x = np.zeros((2, 2))
y = torch.as_tensor(x, dtype=torch.float32)
print(x, '\n', y)

y[0, 0] = 1.
print(x, '\n', y)
2 Likes