Libtorch CMake issues

Miles_Cranmer · October 29, 2018, 11:01am

Hi,

I have been having difficulties getting the basic cmake example working with pytorch, as in https://pytorch.org/tutorials/advanced/cpp_export.html. I have spent about 5 hours adding different flags for CUDA/cuDNN (I am not using GPUs anyway, but it seems like these packages are required and I do have them installed) and messing around with the CMakeLists.txt file. I haven’t been succesful so I ask for help. I am seeing the following log when I run a script make_cmake.sh (which runs cmake with flags) and then make:

-- The C compiler identification is GNU 8.2.0
-- The CXX compiler identification is GNU 8.2.0
-- Check for working C compiler: /cm/shared/sw/pkg/devel/gcc/8.2.0/bin/cc
-- Check for working C compiler: /cm/shared/sw/pkg/devel/gcc/8.2.0/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /cm/shared/sw/pkg/devel/gcc/8.2.0/bin/c++
-- Check for working CXX compiler: /cm/shared/sw/pkg/devel/gcc/8.2.0/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /cm/shared/sw/pkg/devel/cuda/9.0.176 (found suitable version "9.0", minimum required is "7.0") 
-- Caffe2: CUDA detected: 9.0
-- Caffe2: CUDA nvcc is: /cm/shared/sw/pkg/devel/cuda/9.0.176/bin/nvcc
-- Caffe2: CUDA toolkit directory: /cm/shared/sw/pkg/devel/cuda/9.0.176
-- Caffe2: Header version is: 9.0
-- Found CUDNN: /cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/include  
-- Found cuDNN: v7.0.5  (include: /cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/include, library: /cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/lib)
-- Automatic GPU detection failed. Building for common architectures.
-- Autodetected CUDA architecture(s): 3.0;3.5;5.0;5.2;6.0;6.1;7.0;7.0+PTX
-- Added CUDA NVCC flags for: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_70,code=compute_70
-- Found torch: /mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libtorch.so  
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/ceph/users/mcranmer/.../build

make (note: see updated error below!):

Scanning dependencies of target run_pytorch
make[2]: *** No rule to make target `/cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/lib', needed by `run_pytorch'.  Stop.
make[1]: *** [CMakeFiles/run_pytorch.dir/all] Error 2
make: *** [all] Error 2

Here, the make_cmake.sh file (in the build directory) is as follows (the … is a long directory):

#!/bin/bash

rm CMakeCache.txt

module load cuda/9.0.176 cudnn/v7.0-cuda-9.0 gcc/8.2.0 lib/openblas/0.2.19-haswell slurm openmpi
FLAGS="-DCUDA_TOOLKIT_ROOT_DIR=/cm/shared/sw/pkg/devel/cuda/9.0.176 -DTORCH_LIBRARIES=/mnt/ceph/users/mcranmer/Downloads/libtorch -DCMAKE_INSTALL_PREFIX=/mnt/ceph/users/mcranmer/.../build -DCMAKE_PREFIX_PATH=/mnt/ceph/users/mcranmer/Downloads/libtorch -DCUDA_HOST_COMPILER=/usr/bin/gcc44 -DCUDNN_INCLUDE_DIR=/cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/include -DCUDNN_LIBRARY=/cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/lib"

CMAKE=/mnt/ceph/users/mcranmer/Downloads/cmake-3.13.0-rc2-Linux-x86_64/bin/cmake 

$CMAKE $FLAGS ..

My CMakeLists.txt file is the standard:

cmake_minimum_required(VERSION 3.0 FATAL_ERROR)
project(custom_ops)

find_package(Torch REQUIRED)

add_executable(run_pytorch run_pytorch_1d.cpp)
target_link_libraries(run_pytorch "${TORCH_LIBRARIES}")
set_property(TARGET run_pytorch PROPERTY CXX_STANDARD 11)

The code I am attempting to compile (run_pytorch_1d.cpp) is (it should just load a pytorch model and not do anything with it):

#include <torch/script.h> // One-stop header.

#include <cstdlib>
#include <iostream>
#include <memory>

#include "run_pytorch_1d.h"

#define N_FEATURES 13

float run_pytorch_1d_cpp(float *x) {
    std::shared_ptr<torch::jit::script::Module> module = torch::jit::load("/mnt/ceph/users/mcranmer/.../model_to_load_from_cpp.pt");
    return x[0] * x[0];
}

int main(int argc, const char* argv[]) {
    float x[N_FEATURES] = {1};
    printf("%f\n", x[0]);
    return 0;
}

Any idea what’s going on? Earlier I had the issue of it trying to build some cuda library (libcu…a) instead of using the ones in my installation, but it was looking in the wrong directory. I guess the flags fixed it.

Miles_Cranmer · October 29, 2018, 11:17am

So the actual issue is:

make[2]: *** No rule to make target `/usr/local/cuda/lib64/libculibos.a', needed by `run_pytorch'.  Stop.

not the earlier one. The earlier one was because I wrote …/cudnn…/lib instead of …/cudnn…/lib64.

I note that even if I append the absolute location of libculibos.a (which isn’t in /usr/local/cuda/lib64) to CMakeLists.txt in target_link_libraries, I see the same error. I have no idea why it thinks any of my cuda libraries are in /usr/local/cuda/lib64 as this directory does not exist.

Miles_Cranmer · October 30, 2018, 12:47pm

Should I post this on the GitHub? Not sure if this forum is the right place.

Miles_Cranmer · October 30, 2018, 1:33pm

Okay so I have a band-aid fix that works for my current set up. It’s ugly but it works. I would not consider this a solution. I still want to know what went wrong with cmake.

After running cmake, I edit CMakeFiles/run_pytorch.dir/build.make and comment out the following lines:

run_pytorch: /usr/lib64/libcuda.so

and

run_pytorch: /usr/local/cuda/lib64/libculibos.a

I have no idea why these lines are included. After this, I edit CMakeFiles/run_pytorch.dir/build.make (which I found by grep-ing for “libculibos”) and remove both /cm/shared/sw/pkg/devel/cudnn/v7.0-cuda-9.0/lib64
and /usr/local/cuda/lib64/libculibos.a from the line (it is trying to build a directory?), to leave:

/cm/shared/sw/pkg/devel/gcc/8.2.0/bin/c++    -rdynamic CMakeFiles/run_pytorch.dir/run_pytorch_1d.cpp.o  -o run_pytorch -Wl,-rpath,/mnt/ceph/users/mcranmer/Downloads/libtorch/lib -Wl,-Bstatic -lculibos -Wl,-Bdynamic /mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libtorch.so -lcuda -lnvrtc -lnvToolsExt -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -lpthread -ldl -lrt -Wl,--no-as-needed,/mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libcaffe2.so -Wl,--as-needed -Wl,--no-as-needed,/mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libcaffe2_gpu.so -Wl,--as-needed -Wl,-Bstatic -lcudart_static -Wl,-Bdynamic -ldl -lrt /mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libcaffe2.so /mnt/ceph/users/mcranmer/Downloads/libtorch/lib/libc10.so -lpthread -lcufft /cm/shared/sw/pkg/devel/cuda/9.0.176/lib64/libcurand.so -lcublas -Wl,-Bstatic -lcublas_device -Wl,-Bdynamic

Running “make” works, and I can execute the pytorch executable without problems. Note that I am not using CUDA so linking problems with CUDA libraries won’t show up for me.

Does anybody know what I am doing wrong, and why I have to manually edit the cmake output?

Carl · July 11, 2019, 4:01pm

I am also facing this issue. I want to use libtorch on a cluster, where CUDA is not installed in /usr/.

@Miles_Cranmer , I followed your workaround, and I got to compile my small example. It also works with CUDA.

There is a problem with your last post though, you say you edit the same file twice. I think the second file you are referring to is link.txt. I found it using the command:

grep -nr /usr/local/cuda/

In link.txt, I removed the arguments (or parts of arguments) that referred to this path, and it worked.

Carl · July 11, 2019, 4:06pm

I ran

grep -nr /usr/local/cuda

inside the libtorch source directory, and I got these results:

share/cmake/Caffe2/Caffe2Targets.cmake:82:  INTERFACE_LINK_LIBRARIES "caffe2::cudart;c10_cuda;caffe2;caffe2::cufft;caffe2::curand;caffe2::cudnn;/usr/local/cuda/lib64/libculibos.a;dl;/usr/local/cuda/lib64/libculibos.a;caffe2::cublas"
share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA.cmake:34:# ``CUDA_BIN_PATH=/usr/local/cuda1.0`` instead of the default
share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA.cmake:35:# ``/usr/local/cuda``) or set ``CUDA_TOOLKIT_ROOT_DIR`` after configuring.  If
share/cmake/Caffe2/Modules_CUDA_fix/upstream/FindCUDA.cmake:939:    list(APPEND CUDA_LIBRARIES -Wl,-rpath,/usr/local/cuda/lib)
share/cmake/Gloo/GlooTargets.cmake:58:  INTERFACE_LINK_LIBRARIES "/usr/local/cuda/lib64/libcudart.so;\$<LINK_ONLY:pthread>"
share/cmake/Gloo/GlooTargets.cmake:74:  INTERFACE_LINK_LIBRARIES "/usr/local/cuda/lib64/libcudart.so;gloo;/pytorch/build/nccl/lib/libnccl_static.a;dl;rt"
Binary file lib/libcaffe2_gpu.so matches

I think this might be a problem, since not everybody has CUDA installed under /usr/.

Carl · July 11, 2019, 4:16pm

I removed the references to /usr/local/cuda inside the libtorch source, and now no need to edit build.make and link.txt before running make. I will try to find the time to make a pull request.