Torch matmul give false results sometimes

I use the libtorch in my c++ project on Orin jetson platform.
When using matmul, I found it gave false result somtime.

the first matrix size is [1600, 1216, 4].
the second matrix size is [B, 4, 3], and the B stands for batch size
the dtype of both matrixes is float32.
I use the matmul as the following:

torch.matmul(m1.view({-1, 4}), m2);

I found that when B is small(such as <= 2), the result is right. But when B is large(such as 30), the result
is false.

When I tried to change the dtype to be float64, it can give right result.

In my opinion, float32 is enough for the computaion. An example data is like this:
M1 = “”"
089.4820 042.1113 022.1533 001.0000
089.4870 042.1113 022.1523 001.0000
089.4921 042.1113 022.1523 001.0000
089.4970 042.1113 022.1523 001.0000
089.5020 042.1113 022.1523 001.0000
089.5070 042.1113 022.1523 001.0000
089.5120 042.1113 022.1514 001.0000
089.5170 042.1113 022.1514 001.0000
089.5220 042.1113 022.1514 001.0000
089.5270 042.1113 022.1514 001.0000
“”"
M2 = “”"
-1.6649 0001.3089 -0.2094
0001.1314 0001.5086 -0.5520
-0.3370 -1.4009 -0.8071
0107.2991 -150.7653 0062.4391
“”"

Also, when using the matmul implemented by myself, the result is right.
So I think it may be (not sure) a bug of libtorch. And is there any idea to sovle it?
Thanks

Could you post a minimal and executable code snippet to reproduce the issue, the results you are seeing, as well as more information about your setup (i.e. which JetPack version are you using, how did you install PyTorch etc.)?