Mmdetection demo returning: RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

polyteddy · December 17, 2022, 4:16am

I’ve attempted an install of this repo: GitHub - sandipan211/ZSD-SC-Resolver: Resolving semantic confusions for improved zero-shot detection (BMVC 2022)
This work used a Linux environment, which I made every effort to reproduce under Windows:

PyTorch version: 1.1.0
Is debug build: False
CUDA used to build PyTorch: 9.0
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Enterprise
GCC version: Could not collect
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.7.3 (default, Apr 24 2019, 15:29:51) [MSC v.1915 64 bit (AMD64)] (64-bit runtime)
Python platform: Windows-10-10.0.19041-SP0
Is CUDA available: True
CUDA runtime version: 10.2.89
GPU models and configuration: GPU 0: Quadro T2000
Nvidia driver version: 527.41
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: No

Versions of relevant libraries:
[pip3] numpy==1.21.5
[pip3] torch==1.1.0
[pip3] torchvision==0.3.0
[conda] blas 1.0 mkl
[conda] cudatoolkit 9.0 1
[conda] mkl 2021.4.0 haa95532_640
[conda] mkl-service 2.4.0 py37h2bbff1b_0
[conda] mkl_fft 1.3.1 py37h277e83a_0
[conda] mkl_random 1.2.2 py37hf11a4ad_0
[conda] numpy 1.21.5 py37h7a0a035_3
[conda] numpy-base 1.21.5 py37hca35cd5_3
[conda] pytorch 1.1.0 py3.7_cuda90_cudnn7_1 pytorch
[conda] torchvision 0.3.0 pypi_0 pypi

Trying to run an example Jupyter notebook under the mmdetection/demo folder, with a basic image detection exercise, I keep reaching the dreaded “RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED” error:

~\Miniconda3\envs\zsd1\lib\site-packages\torch\nn\modules\conv.py in forward(self, input)
336 _pair(0), self.dilation, self.groups)
337 return F.conv2d(input, self.weight, self.bias, self.stride,
→ 338 self.padding, self.dilation, self.groups)
339
340

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

What might be some root causes of this issue? An obsolete cuDNN version? An unsupported GPU? Something else? Would so appreciate any ideas.

ptrblck · December 17, 2022, 6:33am

PyTorch 1.1.0 with CUDA 9.0 and cuDNN 7.1 is quite old by now so could you update to the latest release (1.13.1) and check if you are still seeing the error?

polyteddy · December 18, 2022, 12:28am

Yes, thank you, updating enabled me to make some progress. I encountered issues with AT_CHECK errors but was able to fix them in all cpp files with

#ifndef AT_CHECK
#define AT_CHECK TORCH_CHECK
#endif