Considerable absolute error in torch.matmul

Hi, I’ve got high absolute error in torch.matmul. Here is my exmaple code.

import torch
from torch import tensor


def main():
    torch.cuda.manual_seed(42)
    mask = tensor([[0., 0., 0.], [1.0, 0., 0.]], device='cuda')
    matrix = torch.randn(3, 3, device='cuda')
    print(matrix)

    res = torch.matmul(mask, matrix)
    print(res)


if __name__ == "__main__":
    main()

And the result is

tensor([[ 0.1940,  2.1614, -0.1721],
        [ 0.8491, -1.9244,  0.6530],
        [-0.6494, -0.8175,  0.5280]], device='cuda:0')
tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.1940,  2.1621, -0.1720]], device='cuda:0')

where I got a high absolute error between 2.1614 and 2.1621. They are expected to be same in the result. But when I move all tensors to cpu, the result is correct. I wonder whether there is a problem in pytorch or cuda backend and how can I fix it.
My Pytorch version is 1.10.1+cu113 and CUDA toolkit version is 11.5.

PS:
As for my running environment, I first produced this issue in WSL2 and I can’t find out its actual cuda toolkit version. Because I just let it use the cuda library provided by the driver in Win. Installing a cuda toolkit of some specific version like 11.3 through apt will cause cuda become unavailable in Pytorch. So I can’t use nvcc to check the version of cuda toolkit in WSL2. Then I reproduced this problem in my server. The gpu of my server is A100 and I run the code in NGC Pytorch 21.11. The cuda toolkit version of my server is 11.5. But I still got the same problem.

@Koramajin I am unable to reproduce the issue
TORCH : 1.10.1
CUDATOOLKIT : 11

tensor([[ 0.1940,  2.1614, -0.1721],
        [ 0.8491, -1.9244,  0.6530],
        [-0.6494, -0.8175,  0.5280]], device='cuda:0')
tensor([[ 0.0000,  0.0000,  0.0000],
        [ 0.1940,  2.1614, -0.1721]], device='cuda:0')

Hi Koramajin!

Could this be the TF32 “bug?”

What specific gpu are you using and does the issue go away if you set
torch.backends.cuda.matmul.allow_tf32 = True?

Edit:

The above should read

torch.backends.cuda.matmul.allow_tf32 = False

True is the default for allow_tf32. The suggestion is to turn off TF32
which is done by setting allow_tf32 to False.

Best.

K. Frank

1 Like

Thanks for reply. But it doesn’t work, the result still has error.

It seems a problem associated with cudatoolkit. Can you tell a more specific version of your cuda toolkit?

Hi Koramajin!

I made a mistake in my earlier post – see the edit, above.

Please try specifically setting the TF32 flag to False:

torch.backends.cuda.matmul.allow_tf32 = False

and see if that resolves the issue.

(Sorry for the earlier mistake.)

Please note: My understanding is that your A100 gpu does support TF32,
so this behavior would be expected.

Best.

K. Frank

Thanks a lot. I didn’t realize that they had optimized matrix multiplication by using TensorFloat32 core in GPU.

Setting torch.backends.cuda.matmul.allow_tf32 = False wor well on colab