Torch.inverse cuda error

cbd · December 14, 2021, 11:49am

I used “a = torch.inverse(mat_inverse)” line in my code. It works fine on google colab but gives below error when run on GPU.

RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

With CUDA_LAUNCH_BLOCKING=1 it shows below error. Is it related to memory requirement?
RuntimeError: CUDA error: invalid argument

ptrblck · December 15, 2021, 2:21am

Could you post a minimal code snippet to reproduce the issue as well as the output of python -m torch.utils.collect_env, please?

cbd · December 16, 2021, 2:52am

It was memory issue as GPU is shared.

ptrblck · December 16, 2021, 7:14am

An invalid argument error is usually not raised if you are running out of memory.
Do you remember which operation raised it when you saw the stack trace (in case you were running with CUDA_LAUNCH_BLOCKING=1)?

cbd · December 16, 2021, 7:48am

You are right. Its not memory issue. The function gives error is “torch.inverse”. Please find the details you required. Let me know if you could resolve.

PyTorch version: 1.10.0+cu113
Is debug build: False
CUDA used to build PyTorch: 11.3
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.2 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.25

Python version: 3.6.9 (default, Jan 26 2021, 15:33:00)  [GCC 8.4.0] (64-bit runtime)
Python platform: Linux-5.4.0-91-generic-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 10.0.130
GPU models and configuration: GPU 0: Quadro P5000
Nvidia driver version: 450.119.03
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] torch==1.10.0+cu113
[pip3] torchfile==0.1.0
[pip3] torchvision==0.11.1+cu113
[pip3] torchviz==0.0.2

ptrblck · December 16, 2021, 7:51am

Could you post an executable code snippet to reproduce the issue?

cbd · December 16, 2021, 8:13am

Its strange but changing the array shape from (1,3,256,256) to (1, 14400, 3, 3) in below code shows error in GPU. On google colab i verified and its working fine. Any solution and cause of issue?

import torch
#mat_inverse=torch.randn(1,3,256,256).to(0)
mat_inverse=torch.rand(1, 14400, 3, 3).to(0)
a=torch.inverse(mat_inverse)
print(a)

cbd · December 21, 2021, 11:57am

Have you arrived to some conclusion?

Frank_Yang · August 3, 2022, 2:32pm

I also encountered this problem when excuting torch.inverse and I solved it by:

torch.inverse(params[‘affine’].to(‘cpu’)).to(‘cuda’)

ptrblck · August 4, 2022, 12:44am

Sorry for not following up earlier as I might have missed your follow-up.
I cannot reproduce the issue in the current release (1.12.0+cu116) using your code so could you update PyTorch and check if you are still seeing the error?

CC @Frank_Yang

huangxin168 · March 3, 2023, 6:20am

Thank you very much, you saved me!