torch._C._LinAlgError after calling torch.linalg.inv from pytorch 1.11 using CUDA 11.3

Sureerat_Reaungamorn · March 31, 2022, 11:00pm

Hi everyone,

I got torch._C._LinAlgError when I called torch.linalg.inv on an invertible matrix representing a similarity transformation (comprising scaling and translation) from Pytorch 1.11 and CUDA 11.3.

Below is the code that produced the error

import torch
import numpy as np

device=torch.device('cuda')
normalization=np.asarray([[[ 0.0138,  0.0000,  0.0000, -2.6834],
 [ 0.0000,  0.0138,  0.0000, -2.2656],
 [ 0.0000,  0.0000,  0.0138, -1.2021],
 [ 0.0000,  0.0000,  0.0000,  1.0000]]])
normalization=torch.from_numpy(normalization).to(device)
torch.linalg.inv(normalization)

The full error message is

torch._C._LinAlgError: cusolver error: CUSOLVER_STATUS_EXECUTION_FAILED, when calling `cusolverDnDgetrf( handle, m, n, dA, ldda, static_cast<double*>(dataPtr.get()), ipiv, info)`. This error may appear if the input matrix contains NaN.

I did not get the error if I used Pytorch 1.9 with CUDA 11.1, but got an inverse matrix

tensor([[[ 72.4638,   0.0000,   0.0000, 194.4493],
         [  0.0000,  72.4638,   0.0000, 164.1739],
         [  0.0000,   0.0000,  72.4638,  87.1087],
         [  0.0000,   0.0000,   0.0000,   1.0000]]], device='cuda:0',
       dtype=torch.float64)

Are there anyone successfully call torch.linalg.inv using Pytorch 1.11 with CUDA 11.3? Would anyone know why I got the error and how I could fix it?

Thank you very much

ptrblck · April 1, 2022, 4:52am

Could you update to the nightly with CUDA11.5, as I cannot reproduce the issue with it?

Sureerat_Reaungamorn · April 1, 2022, 2:39pm

Thank you very much, for a quick respond! We have to update Ubuntu 16.04.7 to 20.04.3 to be able to use CUDA 11.5, but we will do so.

Thanks so much again.