A error when using GPU

#21

After installing Pytorch in this way: pip install -U https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl, the errors will disappear even when you are using 'torch.backends.cudnn.benchmark = True’

3 Likes
(Evergrow) #22

Thanks! But I want to know how to solve this problem on Pytorch 1.0.0, CUDA 9.0, RTX 2080. Must change to CUDA 10.0?

#23

I don’t have RTX 2080 cards and chances are that the driver shipped with CUDA 9.0 is not fully compatible with RTX 2080. I installed CUDA 10.1 at first. After that, I downgraded the CUDA version to 10.0 while not changing the driver. Hope this can help you.

1 Like
(Yotam Gil) #24

Hi,

I see the same issue, with pytorch 1.0.1.post2, CUDA10.0, RTX2080Ti. I can run on another GPU (tried TitanV and 1080Ti), but if running on the 2080Ti, with benchmark=True, I get this error message:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "", line 330, in <module>
    train(epoch)
  File "", line 173, in train
    stereo_out, theta, right_transformed = model(left,right)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 139, in forward
    right_img_transformed, theta = self.stn(right_img)
  File "", line 127, in stn
    x,theta1 = stn(x, self.theta(x), mode=self.stn_mode)
  File "", line 131, in theta
    xs = self.localization(x)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405

Is there already a solution for that?

Thanks, Yotam

#25

I’ve got the same hardware (RTX 2080ti) and this fixed it for me. I had to update pytorch to use CUDA 10.

(Riz) #26

Thanks a lot. It works. But why does it work?

(Pankaj Kabra) #27

Thanks very much this works for me! phew!

#28

I’ve got the same error. even after I update CUDA to 10.
I happened to find a way to remove it. Now my code of training works.

  • cuda: 10.0
  • python: 3.7
  • pytorch: 1.0
  • cudnn: 7
  • GPU: 2080ti

however, another problem came along when run with 'with torch.no_grad(), the output are all nans.
anyone know this?

(Qin) #29

Hi, I still have the problem with Cuda10 and 2080ti. Could you share your solution, please? @ janehu

(Quang Ngoc) #30

set torch.backends.cudnn.benchmark = True worked for me

(Home Wave) #31

thanks, l have met the same error when update pytorch1.0 to 1.1 with RTX2080Ti. Setting cudnn.benchmark = False could help to avoid this error, but in pytorch1.0 cudnn.benchmark = True is no problem.:sweat_smile:

#32

Could you post a small reproducible code snippet and print the PyTorch, CUDA and cudnn version so that we can have a look?