Compiling LibTorch from sources working recipe

ipoletaev · June 14, 2023, 6:30am

I wonder if there are any tested recipes available to compile LibTorch from sources that result in the exactly same ready-to-use package as one that is linked on the main page here?

Instructions mentioned here don’t really work well. For example, for the image nvidia/cuda:11.7.1-cudnn8-devel-ubuntu20.04 while trying to build a release branch:

WORKDIR /workspace
RUN git clone -b v2.0.0 --recurse-submodule https://github.com/pytorch/pytorch.git
WORKDIR /workspace/pytorch/build
ARG TORCH_CUDA_ARCH_LIST="Ampere"
RUN cmake \
    -DBUILD_SHARED_LIBS:BOOL=ON \ 
    -DCMAKE_BUILD_TYPE:STRING=Release \
    -DPYTHON_EXECUTABLE:PATH=`which python3` \
    -DBUILD_PYTHON:BOOL=OFF \
    -DCMAKE_INSTALL_PREFIX:/workspace/libtorch .. && \
    cmake --build . --target install --config Release -- -j$(nproc)

installation won’t results all the shared libraries created, e.g. libnvfuser_codegen.so is missing, etc.
There are a few related and still un-answered questions: this and this.

Thanks in advance!

ptrblck · June 14, 2023, 2:23pm

The pytorch/builder repository contains the scripts we use to build these binaries so you could take a look at it.

ipoletaev · June 15, 2023, 3:47am

Thanks! Indeed, as I thought the build’s logic copies everything was compiled.

Maybe worth adding a mention of it inside torch’s repo docs to not confuse others in the future?

ptrblck · June 15, 2023, 4:29am

This might be a good idea! Would you be interested in updating the docs?

ipoletaev · June 21, 2023, 4:00am

Well, before doing so I’d need to find our why the docker cmake command mentioned above leads to the libtorch build with which on a simple toy example I get:

what():  Type c10::intrusive_ptr<LinearPackedParamsBase> could not be converted to any of the known types.
Exception raised from operator() at /workspace/pytorch/aten/src/ATen/core/jit_type.h:1793 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x60 (0x7fca98ab29a0 in /app/libs/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11:

issue ? This one looks to be similar but already fixed…