Issues about torch::deploy demo

JiayiFeng · June 22, 2022, 2:37pm

I’m trying to run the demo in torch::deploy — PyTorch 1.11.0 documentation, but I have some problems during compiling and running the cpp code in the section “Loading and running the model in C++”

During compiling with the CMakeLists.txt provided in the section “Building and running the application”, I encountered this error message:

(torch-deploy) root@0e5fa48bd947:~/torch-deploy_test/example-app/build# cmake --build . --config Release
[ 50%] Building CXX object CMakeFiles/example-app.dir/example-app.cpp.o
[100%] Linking CXX executable example-app
/usr/bin/ld: CMakeFiles/example-app.dir/example-app.cpp.o: in function `main':
example-app.cpp:(.text+0x2eb): undefined reference to `torch::deploy::InterpreterManager::InterpreterManager(unsigned long, std::shared_ptr<torch::deploy::Environment>)'
/usr/bin/ld: example-app.cpp:(.text+0x359): undefined reference to `torch::deploy::InterpreterManager::loadPackage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)'
/usr/bin/ld: CMakeFiles/example-app.dir/example-app.cpp.o: in function `torch::deploy::InterpreterManager::acquireOne()':
example-app.cpp:(.text._ZN5torch6deploy18InterpreterManager10acquireOneEv[_ZN5torch6deploy18InterpreterManager10acquireOneEv]+0x3e): undefined reference to `torch::deploy::LoadBalancer::acquire()'
/usr/bin/ld: example-app.cpp:(.text._ZN5torch6deploy18InterpreterManager10acquireOneEv[_ZN5torch6deploy18InterpreterManager10acquireOneEv]+0xab): undefined reference to `torch::deploy::InterpreterSession::~InterpreterSession()'
/usr/bin/ld: CMakeFiles/example-app.dir/example-app.cpp.o: in function `torch::deploy::Package::loadPickle(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)':
example-app.cpp:(.text._ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_[_ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_]+0x198): undefined reference to `torch::deploy::InterpreterSession::createMovable(torch::deploy::Obj)'
/usr/bin/ld: example-app.cpp:(.text._ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_[_ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_]+0x1a8): undefined reference to `torch::deploy::InterpreterSession::~InterpreterSession()'
/usr/bin/ld: example-app.cpp:(.text._ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_[_ZN5torch6deploy7Package10loadPickleERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES9_]+0x23e): undefined reference to `torch::deploy::InterpreterSession::~InterpreterSession()'
/usr/bin/ld: CMakeFiles/example-app.dir/example-app.cpp.o: in function `torch::deploy::Package::acquireSession()':
example-app.cpp:(.text._ZN5torch6deploy7Package14acquireSessionEv[_ZN5torch6deploy7Package14acquireSessionEv]+0xad): undefined reference to `torch::deploy::InterpreterSession::~InterpreterSession()'
/usr/bin/ld: example-app.cpp:(.text._ZN5torch6deploy7Package14acquireSessionEv[_ZN5torch6deploy7Package14acquireSessionEv]+0xdf): undefined reference to `torch::deploy::InterpreterSession::~InterpreterSession()'
/usr/bin/ld: CMakeFiles/example-app.dir/example-app.cpp.o: in function `void std::_Destroy<torch::deploy::Interpreter>(torch::deploy::Interpreter*)':
example-app.cpp:(.text._ZSt8_DestroyIN5torch6deploy11InterpreterEEvPT_[_ZSt8_DestroyIN5torch6deploy11InterpreterEEvPT_]+0x18): undefined reference to `torch::deploy::Interpreter::~Interpreter()'
collect2: error: ld returned 1 exit status
make[2]: *** [CMakeFiles/example-app.dir/build.make:112: example-app] Error 1
make[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/example-app.dir/all] Error 2
make: *** [Makefile:91: all] Error 2

The linker cannot find some pytorch::deploy related symbols, and seems that they are not contained by any .so file in the ${conda_env}/lib/python3.9/site-packages/torch dir. Then I search the pytorch’s build/lib dir and find these symbols in the libtorch_deploy_internal.a. So I add this line into the CMakeLists.txt:

target_link_libraries(example-app ${my_pytorch_code_dir}/build/lib/libtorch_deploy_internal.a dl)

Then it compiles successfully. But when I try to run it and load a model package, it crashes and throws this error message:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Exception Caught inside torch::deploy embedded library:
Check failed: (libStart != nullptr && libEnd != nullptr), function writeDeployInterpreter, file /pytorch/torch/csrc/deploy/deploy.cpp, line 80.
torch::deploy requires a build-time dependency on embedded_interpreter or embedded_interpreter_cuda, neither of which were found.  torch::cuda::is_available()=1

Aborted (core dumped)

I do all above tests in nvidia/cuda:11.6.0-devel-ubuntu20.04 docker container. The torch::deploy is installed follow the instruction in torch::deploy — PyTorch 1.11.0 documentation. The PyTorch code version is v1.12.0-rc7

I have no idea about what is the root cause and how to solve it. Can anyone help? Many thanks!

s4ayub · June 22, 2022, 3:10pm

Follow: pytorch/deploy.rst at master · pytorch/pytorch · GitHub
the v1.11 pytorch website docs are outdated

the v1.12+ pytorch website docs have it: torch::deploy — PyTorch 1.12 documentation,