Hi all, I am trying to train a model but I’ve got an error that is “RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED”. I’ve been trying to solve this problem for a week. I got this problem when this part of code runs:
loss.backward()
and the full version of the error is:
Traceback (most recent call last):
File "train.py", line 91, in <module>
train()
File "train.py", line 45, in train
loss.backward()
File "/home/User/.local/lib/python3.6/site-packages/torch/tensor.py", line 107, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/User/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 93, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
My graphical card is Nvidia RTX 2060 (Mobile).
I run with python version 3.6.8.
The installed torch and torchvision are installed properly by the guide of official pytorch installation documents.
I installed cuda and cudnn from Nvidia’s official sources and cuda version is 10.0, cudnn version is cuDNN v7.6.1 (June 24, 2019), for CUDA 10.0.
nvcc -V output is:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
Paths which are defined in .bashrc file:
export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64
export LIBRARY_PATH=$LIBRARY_PATH:/usr/local/cuda-10.0/lib64
nvidia-smi output is:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:01:00.0 On | N/A |
| N/A 48C P5 10W / N/A | 497MiB / 5904MiB | 6% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1710 G /usr/lib/xorg/Xorg 229MiB |
| 0 1842 G /usr/bin/gnome-shell 88MiB |
| 0 12062 G ...equest-channel-token=817002320788015348 50MiB |
| 0 12484 G ...quest-channel-token=3040190697604709129 77MiB |
| 0 12845 G ...quest-channel-token=2447417469316796923 49MiB |
+-----------------------------------------------------------------------------+
My linux distro is pop-OS 18.04.
If you need more information please tell me. Can anyone help me to solve it?