Why different results when multiplying in CPU than in GPU?

Thanks, you are right about the float64. The number of different digits is similar (depends on the experiment), but they are way more closer numbers.

import numpy as np
import torch

a = torch.from_numpy(np.random.rand(5000,100000).astype(np.float64))
b = torch.from_numpy(np.random.rand(5000,100000).astype(np.float64))

c = a.cuda()
d = b.cuda()

print(a.dot(b))
print(c.dot(d))::::::::::
:<EOF>
125000868.65247717
125000868.65247723