Same problem for me. RTX2080Ti, Ubuntu 18.04, CUDA 10
I experience the same problem, I’m using CUDA 9 though.
Anyway, it seems that the GPU is used anyway, even though I’m not sure if it’s used “fully”.
By the way, for anyone following this topic, a solution to at least get things running is to set `torch.backends.cudnn.benchmark = False’. I found this solution in this thread: A error when using GPU
That may solve the problem of the error appearing, though it would be interesting to know why the rtx 2080 ti fails the cuda benchmark.
For me it does also fail the cudnn installation examples. Can anyone else confirm that?
Same with cuda 10, ubuntu 18.04, and Titan RTX, on python 3.6
I put ‘torch.backends.cudnn.benchmark = False’
at the beginning of my source code, but the error message still appears. Not sure why that happened.
are you sure that later in the code that attribute is not set to true? try to print to screen torch.backends.cudnn.benchmark at various points of execution, to make sure it is indeed False
Same issue here. Pytorch 1.0.0, CUDA 10.0, RTX 2080, on Fedora 28.
Set " torch.backends.cudnn.benchmark=False" doesn’t work. It shows:
cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Any information would be appreciated!
Okay I have solve the problem. You cannot directly install pytorch, instead “pip3 install -U https://download.pytorch.org/whl/cu100/torch-1.0.0-cp36-cp36m-linux_x86_64.whl” work for me.
Thanks your reply! But I want to know how to solve this problem on Pytorch 1.0.0, CUDA 9.0, RTX 2080. Must change to CUDA 10.0?
While some users seem to got the RTX 2080 working with CUDA9.X, it seems CUDA10 is the way to go for these new GPUs.
Hi, I signed up today to share a solution
My setup is 2080Ti, CUDA 10.1, python 3.6, installation method is pip3.
I originally installed the latest stable version suggested by pytorch home page https://download.pytorch.org/whl/cu100/torch-1.0.1.post2-cp36-cp36m-linux_x86_64.whl
, which produces that error message.
Then I tried the version 1.0.0 suggested by Yiru_Shen, which still produces that error message.
Finally I tried the nightly build, which does NOT produce any error message, luckily i don’t have to build from source
Welcome and thanks for the sign up and the information!
Thank you! This forum has really helped me a lot, and pytorch is the best framework I’ve seen so far in terms of code & documentation .
RTX only works with cuda 10 toolkit. I use this pytorch conda enviroment and it works (both python3.5 and python 3.7):
conda=/usr/local/anaconda3/bin/conda activate=/usr/local/anaconda3/bin/activate deactivate=/usr/local/anaconda3/bin/deactivate #python3.7 $conda create -y --no-default-packages --prefix /usr/local/pytorch/python3.7/cuda10.0_pytorch_1.0.0 python=3.7 source $activate /usr/local/pytorch/python3.7/cuda10.0_pytorch_1.0.0 $conda install -y pytorch=1.0.0 torchvision cuda100 -c pytorch source $deactivate #python 3.5 $conda create -y --no-default-packages --prefix /usr/local/pytorch/python3.5/cuda10.0_pytorch_1.0.0 python=3.5 source $activate /usr/local/pytorch/python3.5/cuda10.0_pytorch_1.0.0 $conda install -y pytorch=1.0.0 torchvision cuda100 -c pytorch source $deactivate
Hello, I am trying old models on new RTX2080 on Ubuntu 16.04 with nvidia driver 410.57
I’m running some legacy deep learning model using pytorch 0.4.1, in which the model must useRoI Align and NMS which are compiled in pytorch 0.4.1 using ffi instead of cpp, which raised an error when run on pytorch >=1.0. Since pytorch 0.4.1 only support CUDA<10.0, I install CUDA 9.0 (incl. 4 pathces), with CUDNN 7.5.1
I got THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp silent error, while the model still running. After running the model, I must restart first before running other model
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=663 error=11 : invalid argument /home/ivanwilliam/.virtualenvs/virtual-py3/lib/python3.5/site-packages/torch/nn/functional.py:1961: UserWarning: Default upsampling behavior when mode=trilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) tr. batch 1/230 (ep. 1) fw 27.761s / bw 2.157s / total 29.917s || loss: 1.04, class: 0.88, bbox: 0.16 tr. batch 2/230 (ep. 1) fw 0.864s / bw 0.985s / total 1.849s || loss: 0.95, class: 0.72, bbox: 0.23 tr. batch 3/230 (ep. 1) fw 0.860s / bw 0.985s / total 1.845s || loss: 1.23, class: 0.87, bbox: 0.36 tr. batch 4/230 (ep. 1) fw 0.906s / bw 0.990s / total 1.896s || loss: 1.19, class: 0.87, bbox: 0.33 tr. batch 5/230 (ep. 1) fw 0.908s / bw 0.981s / total 1.889s || loss: 0.79, class: 0.63, bbox: 0.16 tr. batch 6/230 (ep. 1) fw 0.920s / bw 0.652s / total 1.573s || loss: 0.87, class: 0.87, bbox: 0.00 tr. batch 7/230 (ep. 1) fw 0.915s / bw 0.983s / total 1.899s || loss: 1.00, class: 0.79, bbox: 0.22 tr. batch 8/230 (ep. 1) fw 0.883s / bw 0.981s / total 1.864s || loss: 0.84, class: 0.79, bbox: 0.06 tr. batch 9/230 (ep. 1) fw 0.852s / bw 0.987s / total 1.839s || loss: 1.33, class: 0.95, bbox: 0.39 tr. batch 10/230 (ep. 1) fw 0.933s / bw 0.990s / total 1.923s || loss: 1.37, class: 0.95, bbox: 0.43 tr. batch 11/230 (ep. 1) fw 0.929s / bw 0.986s / total 1.915s || loss: 1.11, class: 0.87, bbox: 0.24 tr. batch 12/230 (ep. 1) fw 0.927s / bw 0.987s / total 1.915s || loss: 1.00, class: 0.79, bbox: 0.21
Note: I tried to search /pytorch/aten/src/THC/THCGeneral.cpp file, but it doesn’t exists
Does the silent error will force the model on CPU or complicates other process?
Do you have some trick for RTX so it can run pytorch 0.41 model?
hi,my GPU is RTX1650,pytorch 0.4.1 ,cuda 9.0 , i have exactly the same problem…does anybody solve it?
Turing GPUs are supported using CUDA>=10, so you would need to either update PyTorch to the latest stable release (
1.3.1) with CUDA10.1 or try to build PyTorch
0.4.1 with CUDA>=10.0 from source, if you really need the old version.
yeah ，i met the same problem with RTX2080super
Ubuntu 18.04 cuda 10.1，torch 1.3.1,cudnn 7.6.5
Could you update to the latest stable PyTorch version and rerun your code, please?
If you still see this error, could you post a code snippet to reproduce this error?