PyTorch can't find CUDA header cuda_runtime_api.h

GriffinBonner · September 13, 2023, 5:16pm

I am developing a dynamic c++ library that uses the PyTorch c++ front-end internally and cannot impose an external dependence on PyTorch shared libraries (libtorch.so, libtorch_cuda.so, libtorch_cpu.so) on end-users. It is based on an existing c++ application that dynamically links PyTorch shared libraries. Thus, the library needs to link the static versions of PyTorch (libtorch.a, libtorch_cuda.a, libtorch_cpu.a).

CMake

I locate Torch and CUDA-Toolkit in CMake as follows

    find_package(CUDAToolkit 11.8.0 EXACT)
    find_package(Torch)

Issues

1. PyTorch cannot find CUDA header: cuda_runtime_api.h

Compiling the target (my library) that links PyTorch static libraries produces this error:

    /opt/deepsig/.venv/lib/python3.8/site-packages/torch/include/c10/cuda/CUDAStream.h:6:10: fatal error: cuda_runtime_api.h: No such file or directory
    6 | #include <cuda_runtime_api.h>

The CUDA-Toolkit is installed in the following location /usr/local/cuda-11.8
The following symbolic links also exist:

    ls -ln /usr/local/cuda
    lrwxrwxrwx 1 0 0 22 Jun 20 20:40 /usr/local/cuda -> /etc/alternatives/cuda

    ls -ln /usr/local/cuda-11
    lrwxrwxrwx 1 0 0 25 Jun 20 20:40 /usr/local/cuda-11 -> /etc/alternatives/cuda-11

The file cuda_runtime_api.h exists in the expected location:

    find . -name "cuda_runtime_api.h"
    ./local/cuda-11.8/targets/x86_64-linux/include/cuda_runtime_api.h

2. find_package(Torch): Torch-Config.cmake cannot find certain (3rd party) libraries. Produces a warning during cmake configure/generate steps.

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library onnx_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:104 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library foxi_loader_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:106    (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library fmt_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:106 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)

    CMake Warning at /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message):
      static library eigen_blas_LIBRARY-NOTFOUND not found.
    Call Stack (most recent call first):
      /opt/deepsig/.venv/lib/python3.8/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:113 (append_torchlib_if_found)
      CMakeLists.txt:227 (find_package)
    -- Found Torch: /opt/deepsig/.venv/lib/python3.8/site-packages/torch/lib/libtorch.a

PyTorch Build Environment
Docker base image: nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04
OS: Ubuntu 20.04
Python: 3.8
CUDA: 11.8.0
PyTorch: 1.11.0

I configure the PyTorch build in this docker, build the wheel, and install it in a development docker container used to build my c++ library.

Build PyTorch with static libraries

a. Download CUDA dependencies
- libnvinfer8=8.5.3-1+cuda11.8
- libnvinfer-plugin8=8.5.3-1+cuda11.8
- python3-libnvinfer-dev=8.5.3-1+cuda11.8
- python3-libnvinfer=8.5.3-1+cuda11.8
- libnvinfer-dev=8.5.3-1+cuda11.8
- libnvinfer-plugin-dev=8.5.3-1+cuda11.8
- libnvparsers-dev=8.5.3-1+cuda11.8
- libnvonnxparsers-dev=8.5.3-1+cuda11.8
- libnvparsers8=8.5.3-1+cuda11.8
- libnvonnxparsers8=8.5.3-1+cuda11.8
b. Download PyTorch Library Dependencies
- git
- gnupg
- libprotobuf-dev
- wget
c. Clone PyTorch
- I have been unable to configure the PyTorch build so that static libraries are included in the wheel package without manually editing setup.py and adding all lib/*.a to torch_package_data.
```
    git clone https://github.com/pytorch/pytorch.git && \
    cd pytorch && \
    git checkout v1.11.0 && git submodule update --init --recursive && \
    sed -i '1061s/.*/"lib\/*.a"],/' setup.py
```
d. Configure PyTorch and build wheel
```
    export BUILD_SHARED_LIBS=0
    export USE_STATIC_MKL=1
    export TORCH_BUILD_TEST=0
    export TORCH_USE_MKLDNN=0
    export TORCH_USE_TENSORPIPE=0
    export USE_KENETO=0
    export CUDA_HOME=/usr/local/cuda-11.8
    export CUDA_TOOLKIT_ROOT_DIR=$CUDA_HOME
    export LIBRARY_PATH=$CUDA_HOME/lib64:$LIBRARY_PATH
    export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
    export CFLAGS="-I$CUDA_HOME/include $CFLAGS"
    cd pytorch
    python setup.py bdist_wheel
```
e. Install wheel in development container
```
   python -m pip install /path/to/<pytorch>.whl
```

2. Result
The PyTorch build process described above produces the following libraries in my development container:

libXNNPACK.a, libc10.a, libcaffe2_nvrtc.so, libclog.a, libfbgemm.a, libgloo_cuda.a, libnnpack.a, libprotobuf.a, libpthreadpool.a, libqnnpack.a, libsleef.a, libtorch_cpu.a, libtorch_python.so, libasmjit.a, libc10_cuda.a, libcaffe2_protos.a, libcpuinfo.a, libgloo.a libkineto.a, libprotobuf-lite.a, libprotoc.a, libpytorch_qnnpack.a, libshm.so, libtorch.a, libtorch_cuda.a

malfet · April 2, 2024, 12:29am

libtorch_static + cuda is essentially a no-go as it quickly runs into 2Gb .cubin problem and libcudnn.a performance is much worse that that of a shared library, see libtorch_cuda.so is missing fast kernels from libcudnn_static.a, therefore statically linked cuDNN could be much slower than dynamically linked · Issue #50153 · pytorch/pytorch · GitHub

ptrblck · April 2, 2024, 12:44pm

I don’t think the performance issue is still a thing 3 years later, but of course the size issue could even be worse.