Matmul casting as float16 internally

I am using pytorch 1.8.0 and I cannot update it to newer versions due to other dependency issues.
I am facing this issue (C = A@B where A, B are torch.float32 but somehow C becomes torch.float16) but setting torch.backends.cuda.matmul.allow_tf32 = False does not solve it.

How can I fix this?

Hi!
There was an issue relating to this in the previous version of PyTorch. Now that’s resolved. You’ll need to update your PyTorch version. Please see:

In [2]: tensor1 = torch.randn(3, dtype=torch.float32)

In [3]: tensor2 = torch.randn(3, dtype=torch.float32)

In [4]: torch.matmul(tensor1, tensor2).dtype
Out[4]: torch.float32

In [5]: k = tensor1@tensor2
Out[5]: tensor(0.4118)

In [6]: k.dtype
Out[6]: torch.float32

Thanks!

EDIT: @tom shared this issue (ps: this is resolved) in the link you attached. Issue: RFC: Should matmuls use tf32 by default? · Issue #67384 · pytorch/pytorch · GitHub.

Like I said, I cannot update the pytorch version from 1.8 because it breaks other dependencies in the virtual environment that I am using.
Is there another solution?

Another option would be explicitly casting it to float32. In your case, it would be:

C.type(torch.float32)

Thanks!

But this would be after C has already been computed in float16, right?
My issue is that C being in float16 results in nans in C.

But this would be after C has already been computed in float16 , right?

Yep.

I don’t think you can do anything with internal problems in PyTorch. The only solution is updating the version.

Hi @conv8d,

Have you tried setting the default dtype via torch.set_default_dtype(torch.float32)?

Also, can you set up a minimal reproducible script to show your error?

Note that TF32 is not float16, so I would like to clarify the issue a bit more.
Are you seeing the dtype of a result tensor showing float16 or are you running into a precision loss and speculate that float16 might be used internally?

As @khushi-411 already pointed out: TF32 might be used (which has a reduced precision, but the same range as float32) and can be disabled via torch.backends.cuda.matmul.allow_tf32 = False also in 1.8.0 as seen here. The default behavior was changed in later versions to disable TF32 by default for matmuls, but to keep it enabled for cuDNN calls.

Since this flag does not change the behavior, your issue seems to be unrelated to TF32.

As already mentioned: TF32 will use the float32 range and will not overflow for the float16 range limitations, which also suggests your issue might be unrelated to TF32.

1 Like