Numerical issue in torch.matmul

This is a copy from the Github issue I raised.


I tried to utilize the boardcast feature in torch.matmul, but here are some numerical errors when certain types of sparse matrix is involved. To be clear, I did not use any sparse matrix format but most elements in the matrix lapa_c are zero (see the sample code I provide below).

To Reproduce


lapa_c=Build_CoefficientMatrix([1/90,-3/20,3/2,-49/18],64,periodic=True)

x=torch.rand(2,64,1)
out1=torch.matmul(lapa_c,x)
out2=torch.matmul(lapa_c,x[0,:,0])
print(out1[0,0,0].item())
print(out2[0].item())

The Build_CoefficientMatrix is used to build a 64*64 (with some non-zero elements in the bottom left and upper right corner when periodic=True)

def Build_CoefficientMatrix(c:list,meshx:int,periodic:bool=False):
    '''
    c is list of FD coefficient
    e.g. for 1st derivative with 2nd accuracy central difference:
    c=[-0.5,0] 
    '''
    if 2*len(c)-1>=meshx: raise ValueError
    acc = len(c)   
    
    tmp=[]
    c.reverse()
    for i in range(acc):
        x = torch.cat((torch.cat((torch.zeros((i,meshx-i)),
                                    c[i]*torch.eye(meshx-i)),dim=0),
                                    torch.zeros((meshx,i))
                                    ),dim=1)
        tmp.append(x)
    re=tmp[0]
    for k in tmp[1:]:
        re+=k+k.T

    if periodic:
        re[:acc,-acc:]=re[acc:2*acc,:acc]
        re[-acc:,:acc]=re[:acc,acc:2*acc]
    return re

Expected behavior

The two printed outputs are supposed to be the same, but actually they are not.

if you change:

  • the batch size of x back to 1(x=torch.rand(1,64,1))
  • the periodic parameter in Build_CoefficientMatrix to False
  • the c parameter in Build_CoefficientMatrix to a list with length no greater than 3

the two results match each other.

According to albanD in this pytorch/issues/58128. New users can only put 2 links…

when you change the input size, the low level compute libraries that we use might choose a different algorithm which can lead to small difference of the order of 1e-6 for each single op (as per the floating point standard). It’s not something we can change.

But for me, I need to iteratively use torch.matmul for hundreds of times (e.g. in reinforcement learning, a single episode contains hundreds of steps), and the accumulated error will be unacceptable. I just wonder if it is possible to add an option to force the low level compute libs use the same algorithms (though it may not be the most appropriate one).

Environment

PyTorch version: 1.8.1
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.8.0-55-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce RTX 3070
Nvidia driver version: 460.80
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] numpydoc==1.1.0
[pip3] torch==1.8.1
[pip3] torch-cluster==1.5.8
[pip3] torch-geometric==1.6.3
[pip3] torch-scatter==2.0.5
[pip3] torch-sparse==0.6.8
[pip3] torch-spline-conv==1.2.0
[pip3] torchaudio==0.8.0a0+e4e171a
[pip3] torchvision==0.9.1
[conda] blas 1.0 mkl
[conda] cudatoolkit 11.1.1 h6406543_8 conda-forge
[conda] ffmpeg 4.3 hf484d3e_0 pytorch
[conda] mkl 2020.4 h726a3e6_304 conda-forge
[conda] mkl-service 2.3.0 py38h1e0a361_2 conda-forge
[conda] mkl_fft 1.3.0 py38h5c078b8_1 conda-forge
[conda] mkl_random 1.2.0 py38hc5bc63f_1 conda-forge
[conda] numpy 1.19.2 py38h54aff64_0
[conda] numpy-base 1.19.2 py38hfa32c7d_0
[conda] numpydoc 1.1.0 py_1 conda-forge
[conda] pytorch 1.8.1 py3.8_cuda11.1_cudnn8.0.5_0 pytorch
[conda] torch-cluster 1.5.8 pypi_0 pypi
[conda] torch-geometric 1.6.3 pypi_0 pypi
[conda] torch-scatter 2.0.5 pypi_0 pypi
[conda] torch-sparse 0.6.8 pypi_0 pypi
[conda] torch-spline-conv 1.2.0 pypi_0 pypi
[conda] torchaudio 0.8.1 py38 pytorch
[conda] torchvision 0.9.1 py38_cu111 pytorch