I am using pytorch 1.8.0 and I cannot update it to newer versions due to other dependency issues.
I am facing this issue (C = A@B where A, B are torch.float32 but somehow C becomes torch.float16) but setting torch.backends.cuda.matmul.allow_tf32 = False does not solve it.
Hi!
There was an issue relating to this in the previous version of PyTorch. Now that’s resolved. You’ll need to update your PyTorch version. Please see:
In [2]: tensor1 = torch.randn(3, dtype=torch.float32)
In [3]: tensor2 = torch.randn(3, dtype=torch.float32)
In [4]: torch.matmul(tensor1, tensor2).dtype
Out[4]: torch.float32
In [5]: k = tensor1@tensor2
Out[5]: tensor(0.4118)
In [6]: k.dtype
Out[6]: torch.float32
Like I said, I cannot update the pytorch version from 1.8 because it breaks other dependencies in the virtual environment that I am using.
Is there another solution?
Note that TF32 is notfloat16, so I would like to clarify the issue a bit more.
Are you seeing the dtype of a result tensor showing float16 or are you running into a precision loss and speculate that float16 might be used internally?
As @khushi-411 already pointed out: TF32 might be used (which has a reduced precision, but the same range as float32) and can be disabled via torch.backends.cuda.matmul.allow_tf32 = False also in 1.8.0 as seen here. The default behavior was changed in later versions to disable TF32 by default for matmuls, but to keep it enabled for cuDNN calls.
Since this flag does not change the behavior, your issue seems to be unrelated to TF32.
As already mentioned: TF32 will use the float32 range and will not overflow for the float16 range limitations, which also suggests your issue might be unrelated to TF32.
Hello,
I am using torch version 1.13 and same error. Above answers suggest that issue is resolved in newer versions of torch.
Was it not resolved till 1.13 too?