Got CUDNN_STATUS_NOT_INITIALIZED when calling backward() through a module written in C++ API

wctu · July 29, 2020, 8:50pm

I got the error msg when calling backward() through a module written in C++ extension.
I wrote the module based on this tutorial
https://pytorch.org/tutorials/advanced/cpp_extension.html
The module works fine when calling the forward or backward alone.
It also works fine when calling forward along with other torch layers in python.
The following error only occurs when calling backward() and when the module is connected with other layers.

Traceback (most recent call last):
File “/home/wctu/Project/semseg/spn/ptsemseg/models/spn_2way.py”, line 554, in
out.backward()
File “/home/wctu/.local/lib/python3.6/site-packages/torch/tensor.py”, line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/home/wctu/.local/lib/python3.6/site-packages/torch/autograd/init.py”, line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Exception raised from createCuDNNHandle at /pytorch/aten/src/ATen/cudnn/Handle.cpp:9 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc019acd1e2 in /home/wctu/.local/lib/python3.6/site-packages/torch/lib/libc10.so)

According to a previous post, the pytorch installation has already taken care of the cuda, cudnn installation. Does the raised error mean I am using mismatched CUDA driver version?
I am using PyTorch 1.6 and CUDA 10.2 on a Ubuntu 18.04 machine. The CUDA driver is 440.95.01.

Update:
I found the issue has nothing to do with cudnn. It is similar to the post 26114, where the issue was raised by wrong indexing in my CUDA kernel. Once I fix the bug, it works fine now.

ptrblck · July 30, 2020, 8:16am

Good to hear you narrowed down the indexing issue.
For the sake of completeness: due to the asynchronous execution of CUDA kernels, you might see errors from different libs or code places. To get a better stack trace, which should point to the failing operation, you could run the script via: CUDA_LAUNCH_BLOCKING=1 python script.py args.