I got the error msg when calling backward() through a module written in C++ extension.
I wrote the module based on this tutorial
The module works fine when calling the forward or backward alone.
It also works fine when calling forward along with other torch layers in python.
The following error only occurs when calling backward() and when the module is connected with other layers.
Traceback (most recent call last):
File “/home/wctu/Project/semseg/spn/ptsemseg/models/spn_2way.py”, line 554, in
File “/home/wctu/.local/lib/python3.6/site-packages/torch/tensor.py”, line 185, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File “/home/wctu/.local/lib/python3.6/site-packages/torch/autograd/init.py”, line 127, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Exception raised from createCuDNNHandle at /pytorch/aten/src/ATen/cudnn/Handle.cpp:9 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fc019acd1e2 in /home/wctu/.local/lib/python3.6/site-packages/torch/lib/libc10.so)
According to a previous post, the pytorch installation has already taken care of the cuda, cudnn installation. Does the raised error mean I am using mismatched CUDA driver version?
I am using PyTorch 1.6 and CUDA 10.2 on a Ubuntu 18.04 machine. The CUDA driver is 440.95.01.
I found the issue has nothing to do with cudnn. It is similar to the post 26114, where the issue was raised by wrong indexing in my CUDA kernel. Once I fix the bug, it works fine now.