Hi All,
I am porting a computational graph from tensorflow v1 to pytorch and have hit an issue with my float32 data.
The illustrative example below sums up the issue. I get a result which is a vector of either 37059.996 or 37061.0 depending on the hardware used for calculation with a float32 tensor.
# set up arrays in numpy:
A = np.repeat(np.array([[1.0], [33.0], [0.0], [1089.0], [0.0], [0.0], [35937.0], [0.0], [0.0], [0.0], [0.9991], [0.0]]),1000,axis =1).T.astype('float32')
B = torch.empty(12,1).fill_(1.).astype('float32')
# show the dot product of the arrays.
print(np.dot(A,B))
# take the arrays to the GPU and do the same math
print(torch.mm(torch.tensor(A,dtype=torch.float32).to('cuda'),torch.tensor(B,dtype=torch.float32).to('cuda')).to('cpu').detach().numpy())
# show the difference from the two approaches:
print(np.dot(A,B)-torch.mm(torch.tensor(A,dtype=torch.float32).to('cuda'),torch.tensor(B,dtype=torch.float32).to('cuda')).to('cpu').detach().numpy())
I’m completely open to going through my code and switching to torch.float64. However, before I do, could you advise:
- is switching to float64 the best solution or is there a ‘magic fix’ that newer users such as myself might not be aware of?
- does anyone have experience of the kind of performance impact the float32 to float64 change incurs?
- is this inconsistency between GPU and CPU computation understood and is there any documentation on the error bounds a person might expect? (I get that the cuda representation of a number is not actually the same as a numpy representation and am happy to learn more and accommodate if the documentation is available. )
Thanks and regards,
Simon