The error message is “THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument”. But I can’t find the THCGeneral.cpp.
Is your code running fine on the CPU? Could you post the whole stack trace?
I am also having the same issue with 2080Ti.
If I enable cudnn.benchmark
, my code gives the error and crashes. If I disable cudnn.benchmark
, my code still gives the same error but it can still run. Adding CUDA_LAUNCH_BLOCKING=1
doesn’t give anymore details. It stills shows that the code crashes when it reaches the first convolution.
The code was running fine on 1080Ti with cudnn.benchmark
enabled.
Same here, I get this error message on my RTX 2080 Ti but not on the 1080 Ti, same Pytorch (1.0.0) and CUDA (10.0.130), python 3.5.2.
Code to produce the warning/error:
import os
import torch
# force torch to use my RTX 2080TI GPU, modify or remove accordingly
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
torch.backends.cudnn.benchmark = True
from torchvision.models import vgg16
model = vgg16().cuda()
x = torch.zeros((32, 3, 227, 227)).cuda()
model(x)
Prints the error (THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
) and doesn’t return anything.
I have a RTX 2070 with CUDA 10, pytorch 1.0, python 3.6 on Ubuntu 18 and I get this error when running this project: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
With torch.backends.cudnn.benchmark = True
I get the below stack trace and the program exits.
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
Traceback (most recent call last):
File "test.py", line 60, in <module>
model.test() # run inference
File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/base_model.py", line 105, in test
self.forward()
File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/test_model.py", line 65, in forward
self.fake_B = self.netG(self.real_A) # G(A)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 121, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/jwickens/dev/face-translation/pytorch-CycleGAN-and-pix2pix/models/networks.py", line 399,in forward
return self.model(input)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
result = self.forward(*input, **kwargs)
File "/home/jwickens/.local/lib/python3.6/site-packages/torch/nn/modules/conv.py", line 301, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:663
Without that line I get a silent CUDA error once at the beginning. The script works though. THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument
I also have the same silent error with this tutorial https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
There are quite a few issues out here for this error message, some users say its cuda 9.2 and others RTX cards.
RTX 2080ti with cuda 10.0. I got the same problem. I followed the advice of others and turned `torch.backends.cudnn.benchmark = True’ to False and things started working again
After installing Pytorch in this way: pip install -U https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl, the errors will disappear even when you are using 'torch.backends.cudnn.benchmark = True’
Thanks! But I want to know how to solve this problem on Pytorch 1.0.0, CUDA 9.0, RTX 2080. Must change to CUDA 10.0?
I don’t have RTX 2080 cards and chances are that the driver shipped with CUDA 9.0 is not fully compatible with RTX 2080. I installed CUDA 10.1 at first. After that, I downgraded the CUDA version to 10.0 while not changing the driver. Hope this can help you.
Hi,
I see the same issue, with pytorch 1.0.1.post2, CUDA10.0, RTX2080Ti. I can run on another GPU (tried TitanV and 1080Ti), but if running on the 2080Ti, with benchmark=True, I get this error message:
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=11 : invalid argument
Traceback (most recent call last):
File "", line 330, in <module>
train(epoch)
File "", line 173, in train
stereo_out, theta, right_transformed = model(left,right)
File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "", line 139, in forward
right_img_transformed, theta = self.stn(right_img)
File "", line 127, in stn
x,theta1 = stn(x, self.theta(x), mode=self.stn_mode)
File "", line 131, in theta
xs = self.localization(x)
File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
result = self.forward(*input, **kwargs)
File "/home/yotamg/PycharmProjects/PSMNet/venv3/local/lib/python2.7/site-packages/torch/nn/modules/conv.py", line 320, in forward
self.padding, self.dilation, self.groups)
RuntimeError: cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:405
Is there already a solution for that?
Thanks, Yotam
I’ve got the same hardware (RTX 2080ti) and this fixed it for me. I had to update pytorch to use CUDA 10.
Thanks a lot. It works. But why does it work?
Thanks very much this works for me! phew!
I’ve got the same error. even after I update CUDA to 10.
I happened to find a way to remove it. Now my code of training works.
- cuda: 10.0
- python: 3.7
- pytorch: 1.0
- cudnn: 7
- GPU: 2080ti
however, another problem came along when run with 'with torch.no_grad(), the output are all nans.
anyone know this?
Hi, I still have the problem with Cuda10 and 2080ti. Could you share your solution, please? @ janehu
set torch.backends.cudnn.benchmark = True worked for me
thanks, l have met the same error when update pytorch1.0 to 1.1 with RTX2080Ti. Setting cudnn.benchmark = False could help to avoid this error, but in pytorch1.0 cudnn.benchmark = True is no problem.
Could you post a small reproducible code snippet and print the PyTorch, CUDA and cudnn version so that we can have a look?
FYI I’m getting this in some venvs and not others both with torch 1.1.0, RTX2080 so looks like it’s environmental / dependency related.
Are you using CUDA10 for the RTX2080?