import torch
torch.manual_seed(0)
x = torch.ones(1000000).half()
print(x.mean())
print(x.cuda().mean())
tensor(nan, dtype=torch.float16)
tensor(1., device='cuda:0', dtype=torch.float16)
Why?
import torch
torch.manual_seed(0)
x = torch.ones(1000000).half()
print(x.mean())
print(x.cuda().mean())
tensor(nan, dtype=torch.float16)
tensor(1., device='cuda:0', dtype=torch.float16)
Why?
I guess the accumulation kernel on the CPU might not be using float32
to represent the intermediate values as mixed-precision training with float16
is usually used on the GPU (if I’m not mistaken, bfloat16
is the preferred numerical format on the CPU, but let’s wait for others to chime in and correct me).