# Numerical issue in torch.matmul

### This is a copy from the Github issue I raised.

I tried to utilize the boardcast feature in torch.matmul, but here are some numerical errors when certain types of sparse matrix is involved. To be clear, I did not use any sparse matrix format but most elements in the matrix `lapa_c` are zero (see the sample code I provide below).

## To Reproduce

``````
lapa_c=Build_CoefficientMatrix([1/90,-3/20,3/2,-49/18],64,periodic=True)

x=torch.rand(2,64,1)
out1=torch.matmul(lapa_c,x)
out2=torch.matmul(lapa_c,x[0,:,0])
print(out1[0,0,0].item())
print(out2[0].item())
``````

The `Build_CoefficientMatrix` is used to build a 64*64 (with some non-zero elements in the bottom left and upper right corner when `periodic=True`)

``````def Build_CoefficientMatrix(c:list,meshx:int,periodic:bool=False):
'''
c is list of FD coefficient
e.g. for 1st derivative with 2nd accuracy central difference:
c=[-0.5,0]
'''
if 2*len(c)-1>=meshx: raise ValueError
acc = len(c)

tmp=[]
c.reverse()
for i in range(acc):
x = torch.cat((torch.cat((torch.zeros((i,meshx-i)),
c[i]*torch.eye(meshx-i)),dim=0),
torch.zeros((meshx,i))
),dim=1)
tmp.append(x)
re=tmp[0]
for k in tmp[1:]:
re+=k+k.T

if periodic:
re[:acc,-acc:]=re[acc:2*acc,:acc]
re[-acc:,:acc]=re[:acc,acc:2*acc]
return re
``````

## Expected behavior

The two printed outputs are supposed to be the same, but actually they are not.

## if you change:

• the batch size of `x` back to `1`(`x=torch.rand(1,64,1)`)
• the `periodic` parameter in `Build_CoefficientMatrix` to `False`
• the `c` parameter in `Build_CoefficientMatrix` to a list with length no greater than 3

the two results match each other.

According to albanD in this pytorch/issues/58128. New users can only put 2 links…

when you change the input size, the low level compute libraries that we use might choose a different algorithm which can lead to small difference of the order of 1e-6 for each single op (as per the floating point standard). It’s not something we can change.

But for me, I need to iteratively use torch.matmul for hundreds of times (e.g. in reinforcement learning, a single episode contains hundreds of steps), and the accumulated error will be unacceptable. I just wonder if it is possible to add an option to force the low level compute libs use the same algorithms (though it may not be the most appropriate one).

## Environment

PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.8.0-55-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3070
Nvidia driver version: 460.80
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.8.1
[pip3] torch-cluster==1.5.8
[pip3] torch-geometric==1.6.3
[pip3] torch-scatter==2.0.5
[pip3] torch-sparse==0.6.8
[pip3] torch-spline-conv==1.2.0
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchvision==0.9.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.1 h6406543_8 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2020.4 h726a3e6_304 conda-forge
[conda] mkl-service 2.3.0 py38h1e0a361_2 conda-forge
[conda] mkl_fft 1.3.0 py38h5c078b8_1 conda-forge
[conda] mkl_random 1.2.0 py38hc5bc63f_1 conda-forge
[conda] numpy 1.19.2 py38h54aff64_0
[conda] numpy-base 1.19.2 py38hfa32c7d_0
[conda] numpydoc 1.1.0 py_1 conda-forge
[conda] pytorch 1.8.1 py3.8_cuda11.1_cudnn8.0.5_0 pytorch
[conda] torch-cluster 1.5.8 pypi_0 pypi
[conda] torch-geometric 1.6.3 pypi_0 pypi
[conda] torch-scatter 2.0.5 pypi_0 pypi
[conda] torch-sparse 0.6.8 pypi_0 pypi
[conda] torch-spline-conv 1.2.0 pypi_0 pypi
[conda] torchaudio 0.8.1 py38 pytorch
[conda] torchvision 0.9.1 py38_cu111 pytorch