Compiling from source a static PyTorch library

Hi,

I am interested in building the libtorch library statically. To do this, I build the repo using the setup script by executing the python setup.py build command with the following environment variables:

export CMAKE_CXX_COMPILER=/usr/bin/g++-11
export BUILD_PYTHON=OFF
export CXX=/usr/bin/g++-11
export USE_CUDA=OFF
export CMAKE_POSITION_INDEPENDENT_CODE=ON
export BUILD_SHARED_LIBS=OFF
export CMAKE_C_COMPILER=/usr/bin/gcc-11
export CC=/usr/bin/gcc-11
export CXXFLAGS="-fPIC"
export CFLAGS="-fPIC"

However after a while I get many linker errors of this type:

LayoutManager.cpp:(.text+0x1690): multiple definition of torch::nativert::LayoutManager::deallocate_and_plan()'; test_nativert/CMakeFiles/test_nativert.dir/__/__/__/torch/nativert/executor/memory/LayoutManager.cpp.o:LayoutManager.cpp:(.text+0x2770): first defined here
collect2: error: ld returned 1 exit status

Many different functions seem to be defined in several parts. I have not added any code to the pytorch repo and thus this makes me very perplexed on why this happens. In general the process of building a static libtorch library is turning out to be a real pain. If anyone has some suggestions on how to do it, it would be greatly appreciated!

Hey!

I thought our libtorch package from https://pytorch.org/get-started/locally/ contains pre-built static binaries?

Hi albanD,

Do you mean downloading it from this link here?


I have already tried it. When you do that you get a libtorch.so library. I have seen somewhere that if you change the “shared” with “static” in the link that you get the static variant. Unfortunately, that is not true libtorch.so is still there, all the other libraries are however static. What I want to achieve is to have a libtorch.a static library. I managed to compile it from source by setting the following environment variables:

BUILD_TEST=0
CMAKE_CXX_COMPILER=/usr/bin/g++-11
USE_MKLDNN=1
CXX=/usr/bin/g++-11
CXXFLAGS=-fPIC
USE_CUDA=OFF
CMAKE_POSITION_INDEPENDENT_CODE=ON
BUILD_SHARED_LIBS=OFF
CMAKE_C_COMPILER=/usr/bin/gcc-11
CC=/usr/bin/gcc-11
CFLAGS=-fPIC

After that I ran python setup.py build and I managed to build a libtorch.a. However, I still am having lots of issues like many of the operations are missing and after I try to load a model I trained. I get an error like this:

terminate called after throwing an instance of 'torch::jit::ErrorReport'
  what():  
Unknown builtin op: aten::mul.
Could not find any similar ops to aten::mul. This op may not exist or may not be currently supported in TorchScript.
:
  File "<string>", line 3

def mul(a : float, b : Tensor) -> Tensor:
  return b * a
         ~~~~~ <--- HERE
def add(a : float, b : Tensor) -> Tensor:
  return b + a
'mul' is being compiled since it was called from 'gelu_0_9'
  File "<string>", line 3

def gelu_0_9(self: Tensor) -> Tensor:
  return torch.gelu(self, approximate='none')
                                      ~~~~~~ <--- HERE

It would be amazing to have more documentation and more support for static binaries. In our case we are trying to deploy our models on prem, it is much better to statically link the libraries.

Cheers!

Ho yeah sorry, I was just told that we removed them from the binary we ship in the early 2.X versions.

Are you running things with TorchScript? I am afraid it is is not maintained for the past 4/5 years. If you are looking for a c++ runtime for PyTorch, ExecuTorch Welcome to the ExecuTorch Documentation — ExecuTorch 0.6 documentation is the new version of this.

Hi albanD,

Thank you for your help! I was not aware that ExecuTorch is now the go to method for deploying models. Most of the documentation and tutorials talk about TorchScript. I will take a look at ExecuTorch and hopefully this will solve my problems.

Cheers!