PyTorch can't find CUDA header cuda_runtime_api.h

I am developing a dynamic c++ library that uses the PyTorch c++ front-end internally and cannot impose an external dependence on PyTorch shared libraries (libtorch.so, libtorch_cuda.so, libtorch_cpu.so) on end-users. It is based on an existing c++ application that dynamically links PyTorch shared libraries. Thus, the library needs to link the static versions of PyTorch (libtorch.a, libtorch_cuda.a, libtorch_cpu.a).

CMake

  • I locate Torch and CUDA-Toolkit in CMake as follows
    find_package(CUDAToolkit 11.8.0 EXACT)
    find_package(Torch)

Issues

1. PyTorch cannot find CUDA header: cuda_runtime_api.h

  • Compiling the target (my library) that links PyTorch static libraries produces this error:
    /opt/deepsig/.venv/lib/python3.8/site-packages/torch/include/c10/cuda/CUDAStream.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory
    6 | #include <cuda_runtime_api.h>
  • The CUDA-Toolkit is installed in the following location /usr/local/cuda-11.8
  • The following symbolic links also exist:
    ls -ln /usr/local/cuda
    lrwxrwxrwx 1 0 0 22 Jun 20 20:40 /usr/local/cuda -> /etc/alternatives/cuda

    ls -ln /usr/local/cuda-11
    lrwxrwxrwx 1 0 0 25 Jun 20 20:40 /usr/local/cuda-11 -> /etc/alternatives/cuda-11
  • The file cuda_runtime_api.h exists in the expected location:
    find . -name "cuda_runtime_api.h"
    ./local/cuda-11.8/targets/x86_64-linux/include/cuda_runtime_api.h

2. find_package(Torch): Torch-Config.cmake cannot find certain (3rd party) libraries. Produces a warning during cmake configure/generate steps.

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library onnx_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:104 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library foxi_loader_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:106    (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library fmt_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:106 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library eigen_blas_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:113 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)
    -- Found Torch: /opt/deepsig/.venv/lib/python3.8/site-packages/torch/lib/libtorch.a

PyTorch Build Environment
Docker base image: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
OS: Ubuntu 20.04
Python: 3.8
CUDA: 11.8.0
PyTorch: 1.11.0

I configure the PyTorch build in this docker, build the wheel, and install it in a development docker container used to build my c++ library.

  1. Build PyTorch with static libraries

    a. Download CUDA dependencies

    • libnvinfer8=8.5.3-1+cuda11.8
    • libnvinfer-plugin8=8.5.3-1+cuda11.8
    • python3-libnvinfer-dev=8.5.3-1+cuda11.8
    • python3-libnvinfer=8.5.3-1+cuda11.8
    • libnvinfer-dev=8.5.3-1+cuda11.8
    • libnvinfer-plugin-dev=8.5.3-1+cuda11.8
    • libnvparsers-dev=8.5.3-1+cuda11.8
    • libnvonnxparsers-dev=8.5.3-1+cuda11.8
    • libnvparsers8=8.5.3-1+cuda11.8
    • libnvonnxparsers8=8.5.3-1+cuda11.8

    b. Download PyTorch Library Dependencies

    • git
    • gnupg
    • libprotobuf-dev
    • wget

    c. Clone PyTorch

    • I have been unable to configure the PyTorch build so that static libraries are included in the wheel package without manually editing setup.py and adding all lib/*.a to torch_package_data.
        git clone https://github.com/pytorch/pytorch.git && \
        cd pytorch && \
        git checkout v1.11.0 && git submodule update --init --recursive && \
        sed -i '1061s/.*/"lib\/*.a"],/' setup.py
    

    d. Configure PyTorch and build wheel

        export BUILD_SHARED_LIBS=0
        export USE_STATIC_MKL=1
        export TORCH_BUILD_TEST=0
        export TORCH_USE_MKLDNN=0
        export TORCH_USE_TENSORPIPE=0
        export USE_KENETO=0
        export CUDA_HOME=/usr/local/cuda-11.8
        export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
        export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
        export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
        export CFLAGS="-I$CUDA_HOME/include $CFLAGS"
        cd pytorch
        python setup.py bdist_wheel
    

    e. Install wheel in development container

       python -m pip install /path/to/<pytorch>.whl
    

2. Result
The PyTorch build process described above produces the following libraries in my development container:

  • libXNNPACK.a, libc10.a, libcaffe2_nvrtc.so, libclog.a, libfbgemm.a, libgloo_cuda.a, libnnpack.a, libprotobuf.a, libpthreadpool.a, libqnnpack.a, libsleef.a, libtorch_cpu.a, libtorch_python.so, libasmjit.a, libc10_cuda.a, libcaffe2_protos.a, libcpuinfo.a, libgloo.a libkineto.a, libprotobuf-lite.a, libprotoc.a, libpytorch_qnnpack.a, libshm.so, libtorch.a, libtorch_cuda.a

libtorch_static + cuda is essentially a no-go as it quickly runs into 2Gb .cubin problem and libcudnn.a performance is much worse that that of a shared library, see libtorch_cuda.so is missing fast kernels from libcudnn_static.a, therefore statically linked cuDNN could be much slower than dynamically linked · Issue #50153 · pytorch/pytorch · GitHub

I don’t think the performance issue is still a thing 3 years later, but of course the size issue could even be worse.