A error when using GPU

The error message is “THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument”. But I can’t find the THCGeneral.cpp.

Is your code running fine on the CPU? Could you post the whole stack trace?

I am also having the same issue with 2080Ti.

If I enable cudnn.benchmark, my code gives the error and crashes. If I disable cudnn.benchmark, my code still gives the same error but it can still run. Adding CUDA_LAUNCH_BLOCKING=1 doesn’t give anymore details. It stills shows that the code crashes when it reaches the first convolution.

The code was running fine on 1080Ti with cudnn.benchmark enabled.

1 Like

Same here, I get this error message on my RTX 2080 Ti but not on the 1080 Ti, same Pytorch (1.0.0) and CUDA (10.0.130), python 3.5.2.

Code to produce the warning/error:

import os
import torch

# force torch to use my RTX 2080TI GPU, modify or remove accordingly
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
torch.backends.cudnn.benchmark = True

from torchvision.models import vgg16
model = vgg16().cuda()
x = torch.zeros((32, 3, 227, 227)).cuda()
model(x)

Prints the error (THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument) and doesn’t return anything.

I have a RTX 2070 with CUDA 10, pytorch 1.0, python 3.6 on Ubuntu 18 and I get this error when running this project: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

With torch.backends.cudnn.benchmark = True I get the below stack trace and the program exits.

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
  File "test.py", line 60, in <module>
    model.test()           # run inference
  File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/base_model.py", line 105, in test
    self.forward()
  File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/test_model.py", line 65, in forward
    self.fake_B = self.netG(self.real_A)  # G(A)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/networks.py", line 399,in forward
    return self.model(input)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
    input = module(input)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663

Without that line I get a silent CUDA error once at the beginning. The script works though. THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument

I also have the same silent error with this tutorial https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

There are quite a few issues out here for this error message, some users say its cuda 9.2 and others RTX cards.

1 Like

RTX 2080ti with cuda 10.0. I got the same problem. I followed the advice of others and turned `torch.backends.cudnn.benchmark = True’ to False and things started working again

2 Likes

After installing Pytorch in this way: pip install -U https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl, the errors will disappear even when you are using 'torch.backends.cudnn.benchmark = True’

4 Likes

Thanks! But I want to know how to solve this problem on Pytorch 1.0.0, CUDA 9.0, RTX 2080. Must change to CUDA 10.0?

I don’t have RTX 2080 cards and chances are that the driver shipped with CUDA 9.0 is not fully compatible with RTX 2080. I installed CUDA 10.1 at first. After that, I downgraded the CUDA version to 10.0 while not changing the driver. Hope this can help you.

1 Like

Hi,

I see the same issue, with pytorch 1.0.1.post2, CUDA10.0, RTX2080Ti. I can run on another GPU (tried TitanV and 1080Ti), but if running on the 2080Ti, with benchmark=True, I get this error message:

THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
  File "", line 330, in <module>
    train(epoch)
  File "", line 173, in train
    stereo_out, theta, right_transformed = model(left,right)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "", line 139, in forward
    right_img_transformed, theta = self.stn(right_img)
  File "", line 127, in stn
    x,theta1 = stn(x, self.theta(x), mode=self.stn_mode)
  File "", line 131, in theta
    xs = self.localization(x)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405

Is there already a solution for that?

Thanks, Yotam

I’ve got the same hardware (RTX 2080ti) and this fixed it for me. I had to update pytorch to use CUDA 10.

Thanks a lot. It works. But why does it work?

Thanks very much this works for me! phew!

I’ve got the same error. even after I update CUDA to 10.
I happened to find a way to remove it. Now my code of training works.

  • cuda: 10.0
  • python: 3.7
  • pytorch: 1.0
  • cudnn: 7
  • GPU: 2080ti

however, another problem came along when run with 'with torch.no_grad(), the output are all nans.
anyone know this?

Hi, I still have the problem with Cuda10 and 2080ti. Could you share your solution, please? @ janehu

set torch.backends.cudnn.benchmark = True worked for me

thanks, l have met the same error when update pytorch1.0 to 1.1 with RTX2080Ti. Setting cudnn.benchmark = False could help to avoid this error, but in pytorch1.0 cudnn.benchmark = True is no problem.:sweat_smile:

Could you post a small reproducible code snippet and print the PyTorch, CUDA and cudnn version so that we can have a look?

FYI I’m getting this in some venvs and not others both with torch 1.1.0, RTX2080 so looks like it’s environmental / dependency related.

Are you using CUDA10 for the RTX2080?

1 Like