Matrix inversion fails on GPU (google Colab)

I’m having trouble performing matrix inversion on the GPU - on a matrix that inverts fine on the CPU. I am using Google Colab with torch version 1.3.0+cu100. Here is my code:

import torch
dim = 100
# CPU inversion
A = torch.rand(dim,dim,device='cpu')
Ainv = A.inverse()
print(torch.matmul(A,Ainv))

# GPU inversion
A = A.to('cuda')
Ainv = A.inverse()
print(torch.matmul(A,Ainv))

For a small matrix (i.e. setting dim = 100), I get the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

For a large matrix (i.e. setting dim = 1000), I get the following error:
RuntimeError: inverse_cuda: U(1,1) is zero, singular U.

In both cases, the inversion goes fine on the CPU, but inverting the same matrix on the GPU fails. Any help is appreciated!

Edit: Running the above code on another workstation with torch version 1.0.1.post2 does not produce this error.

1 Like

Does the error happen during the inverse or during the matmul in the print?

When dim=100, it fails on matmul and we get the cublas error. When dim=1000, it fails on the inversion step, and we get the singular U error.

Reverting to a previous version of pytorch fixes the errors, which we can do in Colab with:

!pip install torch==1.0.0 torchvision==0.2.1

Hi,

It might be related to some magma updates.
You can see the progress on this issue: https://github.com/pytorch/hub/issues/62

I have the same issue with PyTorch.

Downgrading to the torch==1.0.0 torchvision==0.2.1 did not work for me. The same error still persists.