A error when using GPU

After installing Pytorch in this way: pip install -U https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl, the errors will disappear even when you are using 'torch.backends.cudnn.benchmark = True’

5 Likes

Thanks! But I want to know how to solve this problem on Pytorch 1.0.0, CUDA 9.0, RTX 2080. Must change to CUDA 10.0?

I don’t have RTX 2080 cards and chances are that the driver shipped with CUDA 9.0 is not fully compatible with RTX 2080. I installed CUDA 10.1 at first. After that, I downgraded the CUDA version to 10.0 while not changing the driver. Hope this can help you.

1 Like

Hi,

I see the same issue, with pytorch 1.0.1.post2, CUDA10.0, RTX2080Ti. I can run on another GPU (tried TitanV and 1080Ti), but if running on the 2080Ti, with benchmark=True, I get this error message:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "", line 330, in <module>
    train(epoch)
  File "", line 173, in train
    stereo_out, theta, right_transformed = model(left,right)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 139, in forward
    right_img_transformed, theta = self.stn(right_img)
  File "", line 127, in stn
    x,theta1 = stn(x, self.theta(x), mode=self.stn_mode)
  File "", line 131, in theta
    xs = self.localization(x)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405

Is there already a solution for that?

Thanks, Yotam

I’ve got the same hardware (RTX 2080ti) and this fixed it for me. I had to update pytorch to use CUDA 10.

Thanks a lot. It works. But why does it work?

Thanks very much this works for me! phew!

I’ve got the same error. even after I update CUDA to 10.
I happened to find a way to remove it. Now my code of training works.

  • cuda: 10.0
  • python: 3.7
  • pytorch: 1.0
  • cudnn: 7
  • GPU: 2080ti

however, another problem came along when run with 'with torch.no_grad(), the output are all nans.
anyone know this?

Hi, I still have the problem with Cuda10 and 2080ti. Could you share your solution, please? @ janehu

set torch.backends.cudnn.benchmark = True worked for me

1 Like

thanks, l have met the same error when update pytorch1.0 to 1.1 with RTX2080Ti. Setting cudnn.benchmark = False could help to avoid this error, but in pytorch1.0 cudnn.benchmark = True is no problem.:sweat_smile:

Could you post a small reproducible code snippet and print the PyTorch, CUDA and cudnn version so that we can have a look?

FYI I’m getting this in some venvs and not others both with torch 1.1.0, RTX2080 so looks like it’s environmental / dependency related.

Are you using CUDA10 for the RTX2080?

1 Like

I met the same error

 RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383

with environment:

pyotrch:  1.1.0
cuda: 9.0.176
GPU: RTX2080 Ti
Driver: 418.67

change the cuda version to

10.0.130

the issue gets solved.

Interesting, I get this error on cuda release 10.2 as well (V10.2.89).

Edit: Got it fixed by following ptrblck’s solution from here.

1 Like

I also met this error.
My gpu is GeForce RTX 2080 Ti.
After I upgrade cuda from cuda-8.0 to cuda-10.1 and pytorch from 0.3.0 to 1.4.0, this error is fixed.

1 Like

This also works for me, many thanks!

Same error here:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=47 error=804

CUDA: release 10.1, V10.1.243
pytorch: 1.5.0+cu101
torchvision: 0.6.0+cu101
GPU: 2080 TI
docker: 19.03.8

Tried every solution mentioned above, but nothing worked.

it’s depends on your GPU type here https://en.wikipedia.org/wiki/CUDA
GeForce RTX 2080 was Turing (microarchitecture)
I’ve got the error when using CUDA 9.0 , it’s because my GPU Quadro RTX 5000 was Turing (microarchitecture) also, which is not compatible with CUDA 9.2 below
but ony compatible with CUDA 10 above