What’s going on here? Comments will be appreciated.
[[1.0]] multiplied by [[1.0001]] incorrectly results in 1., whereas [1.0] multiplied by [1.0001] correctly results in 1.0001 on Win10.
It seems as if the internal accumulator is only float16. It works with float64, or without using CUDA.
Cannot reproduce on Ubuntu machine.
Code
import torch
dtype = torch.float32
A = torch.tensor([[1.]], dtype=dtype).cuda()
B = torch.tensor([[1.0001]], dtype=dtype).cuda()
test1 = torch.matmul(A, B)
A = torch.tensor([1.], dtype=dtype).cuda()
B = torch.tensor([1.0001], dtype=dtype).cuda()
test2 = torch.matmul(A, B)
print(test1)
print(test2)
print(torch.version.cuda)
print(torch.version)
Output
tensor([[1.]], device=‘cuda:0’)
tensor(1.0001, device=‘cuda:0’)
Cuda v11.3
Torch v1.10.0+cu113