Error using CUDNN in custom cuda file

I am writing my own CUDA implementation of a CNN and to do so, I’m compiling it as a CPP extension file, using Ninja. All the base CUDA code (e.g., malloc, memcopy, kernel launch etc) works fine, but I am having an issue using CUDNN in my custom extension.

I’m running PyTorch 1.10 using a Conda environment. PyTorch is using CUDNN, ( torch.backends.cudnn.enabled == True and torch.backends.cudnn.version() returns 8200).
I also have CUDNN installed in my /usr/local/ folder, since I think this is CUDA code that Ninja uses?

When I try to call CUDNN functions, it builds fine, but when I call it from the Python script, I get an error:
undefined symbol: cudnnCreateTensorDescriptor

<cudnn.h> is included in my .cu file. Also, when I compile this code using nvcc directly (and I pass -lcudnn flag), it compiles and runs fine. But if I do the same in my file when I build the extension, it still gives me the same error.

Specifically, I added the extra_compile_args line to my file to build the extension:

              name='<cuda code>',
              sources=['<cuda file>.cu'],
              extra_compile_args={'nvcc': ['-lcudnn']}

I also confirmed in my build log that -lcudnn is being passed to nvcc. But I still get the same error.

Is there something else I need to do to get nvcc to link the cudnn library in when it builds the extension?

Could you try to pass -lcudnn into extra_ldflags = ["-lcudnn"]?

No luck. I added extra_ldflags = ["-lcudnn"] as you said but this time, it didn’t even add -lcudnn as an argument to nvcc when I built it.

When I did extra_compile_args={'nvcc': ['-lcudnn']} it did add -lcudnn as an argument but still had the error I showed above.

EDIT: I also just noticed this line at the top of the outputs when I build the extension:

UserWarning: Unknown Extension options: 'extra_ldflags'

That’s strange, as it’s used in this test. Could you check if you could run and build this test?

Sorry, I’ve never run any tests that come with PyTorch. Is there any documentation on how I can run this test to check?

You can git clone the PyTorch repository and launch the code via python test/ -v. In case you are using Python>=3.8 you could filter out tests via -k cudnn.

So I cloned the 1.10.0 branch of the repo but do I need to build pytorch again here for it to work? I’ve tried that before and had a load of issues. Is there no way to run this test using the version of PyTorch I have installed in a conda env?

Just in case, I gave that a quick try but it gave me an error for the line import expecttest. So I guess there are other things that have to be included for testing to work with the installed version of PyTorch?

You can just pip install expecttest in your current environment and execute the test afterwards.
A source build or the binaries would work.
E.g. I just reran the test using the 1.10.0+cu113 pip wheels in the source folder:

python -v -k cudnn
Fail to import hypothesis in common_utils, tests are not derandomized
test_jit_cudnn_extension (__main__.TestCppExtensionJIT) ... Using /opt/.cache/torch_extensions/py38_cu113 as PyTorch extensions root...
Creating extension directory /opt/.cache/torch_extensions/py38_cu113/torch_test_cudnn_extension...
Detected CUDA files, patching ldflags
Emitting ninja build file /opt/.cache/torch_extensions/py38_cu113/torch_test_cudnn_extension/
Building extension module torch_test_cudnn_extension...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] c++ -MMD -MF cudnn_extension.o.d -DTORCH_EXTENSION_NAME=torch_test_cudnn_extension -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /opt/miniforge3/envs/nightly_pip_cuda113/lib/python3.8/site-packages/torch/include -isystem /opt/miniforge3/envs/nightly_pip_cuda113/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /opt/miniforge3/envs/nightly_pip_cuda113/lib/python3.8/site-packages/torch/include/TH -isystem /opt/miniforge3/envs/nightly_pip_cuda113/lib/python3.8/site-packages/torch/include/THC -isystem /usr/local/cuda/include -isystem /opt/miniforge3/envs/nightly_pip_cuda113/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++14 -c /opt/libs/upstream/pytorch/test/cpp_extensions/cudnn_extension.cpp -o cudnn_extension.o 
[2/2] c++ cudnn_extension.o -shared -lcudnn -L/opt/miniforge3/envs/nightly_pip_cuda113/lib/python3.8/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda_cu -ltorch_cuda_cpp -ltorch -ltorch_python -L/usr/local/cuda/lib64 -lcudart -o
Loading extension module torch_test_cudnn_extension...

Ran 1 test in 10.621s


Thank you! So I was able to run the test as well and it produced identical results to what you showed. However, it seems the test is calling c++ while in my case it is called nvcc. And -DPYBIND11_COMPILER_TYPE=\"_gcc\" means its using GCC then? How is GCC able to compile CUDA code?

Also, this is using the CUDNN version inside my conda env? When I do the non-JIT way to build it, its using CUDNN in my /usr/local directory instead.