CPU and CUDA float32 calculations are different

Can you help explain?

import torch

(10000 ** (torch.tensor(2.0, dtype=torch.float32, device='cpu') / 128)).tolist()
1.1547819375991821

(10000 ** (torch.tensor(2.0, dtype=torch.float32, device='cuda:0') / 128)).tolist()
1.1547820568084717

Simplified version:

(10000 ** torch.tensor(0.015625, dtype=torch.float32, device='cpu')).tolist()
1.1547819375991821
(10000 ** torch.tensor(0.015625, dtype=torch.float32, device='cuda:0')).tolist()
1.1547820568084717

This is expected since different architectures could use different algorithms/implementations.
Both are correct and show a similar error to a wider dtype reference:

torch.set_printoptions(precision=15)


x_cpu = (10000 ** torch.tensor(0.015625, dtype=torch.float32, device='cpu'))
print(x_cpu)
# tensor(1.154781937599182)

x_gpu = (10000 ** torch.tensor(0.015625, dtype=torch.float32, device='cuda:0'))
print(x_gpu)
# tensor(1.154782056808472, device='cuda:0')

x_ref = (10000 ** torch.tensor(0.015625, dtype=torch.float64, device='cpu'))
print(x_ref)
# tensor(1.154781984689458, dtype=torch.float64)

err_cpu = (x_ref - x_cpu.double())
print(err_cpu)
# tensor(4.709027612292971e-08, dtype=torch.float64)

err_gpu = (x_ref - x_gpu.cpu().double())
print(err_gpu)
# tensor(-7.211901342785154e-08, dtype=torch.float64)
2 Likes