Matmul gives different results depending on operating system

Below is an example of matrix multiplication that incurs precision loss in float32. Any idea why the result is different when I run it on Mac vs Linux?

import torch
x = torch.tensor([[11041., 13359, 15023, 18177], [13359, 16165, 18177, 21995], [15023, 18177, 20453, 24747], [18177, 21995, 24747, 29945]])
y = torch.tensor([[29945., -24747, -21995, 18177], [-24747, 20453, 18177, -15023], [-21995, 18177, 16165, -13359], [18177, -15023, -13359, 11041]])
print(x@y)

On Linux I’m seeing the following result, both GPU and CPU (colab)

[[ 33.  17.   1.   1.]
 [ 11.  27.  -5.  -5.]
 [ 11.  -5.  27. -21.]
 [ -7.   9.   9.  25.]]

Whereas on mac laptop, I see the following

[[28.,  0., -4.,  0.],
 [ 0., 28.,  0., -4.],
 [20.,  0., 20.,  0.],
 [ 0., 20.,  0., 20.]]

Hi,

The expected result is 16 * Identity right?
I guess using different blas libraries that potentially use different algorithms for mm will lead to different results. Running on different gpus / different number of cpu cores (for larger matrices) would give different results as well.

Yes, 16*Identity is the correct answer. The curious part was that CPU result is identical to GPU result when running on Volta, so it must be using the same algorithm

Actually on my machine, a binary install gives me the same thing as your mac while a source install gives the same thing as your linux (with everything else being the same). Do you use OpenBLAS on your linux and mkl on your mac by any chance?

For both Linux and Mac, I installed PyTorch latest official conda instructions for PyTorch 1.2. Is there a way to check if it’s using MKL?

BTW, this came out when trying to debug Hessian calculation of f(X)=sum(Y*Y) with Y=AXB, all three matrices are 2x2 initialized with entries 1,2,3,…,12 . Looks like numerical cancellation can be an issue even for tiny examples.

You can use the get_env_info() function from the torch.utils.collect_env module to get some informations.

To be sure what is used, the best way is to open python. import torch. Copy the path to the main c library with torch._C.__file__. Then in your terminal, run ldd path_to_that_library. It will list the shared libraries it links to.

In my case, for the source install, there is the line: libopenblas.so.0 => MY_OPENBLAS_INSTALL_PATH/lib/libopenblas.so.0. So this one definitely use my own openblas install.

For the binary install, no such line exist, but if you check libtorch.so, it jumps from 95MB in a source install to 230MB for a binary one. That’s because the mkl library is statically linked into it. You can see this by first getting the path to libtorch.so from the ldd command above, then run nm -gC path_to_libtorch.so | grep mkl to see all the symbols associated with mkl (there is a lot of them, this will spam heavily your terminal).

Note that running that same command on the libtorch.so from the source install does not see any mkl symbols (only mkldnn ones that are unrelated).

Now if both your computers are using the mkl library, maybe there is different handling of mac machines in mkl?

2 Likes

Thanks for the tip! It looks like conda install brings in MKL, whereas default AWS AMI images don’t include MKL.

I observed a couple of other results for this problem (curiously, always integer valued) on different configurations, looks like the real issue is the numerics of the underlying example