Running Pytorch models on host x86

Marat · June 18, 2019, 12:57pm

Sorry for possibly stupid question. But my question emerges naturally due to lack (in my opinion) of complete step-by-step documentation for release transition process.

To the best of my knowledge the only way to effectively (using GPU + quantization to int8 + some compiler optimizations) execute PyTorch models on host devices is:

Make JIT code from PyTorch models
Save this model in ONNX format on disk
Make an application or use existing to load and run pytorch model in ONNX format saved previously

My question is how much code will be autonomous? What libraries and other environment features will it require to run on host x86 server with GPU on the customer side?

jfix · June 18, 2019, 3:58pm

Not stupid at all!

Right now we support the ONNXIFI interface which allows PyTorch/Caffe2 to use Glow as an execution backend. Through ONNXIFI (technically FOXI as noted there) the model is passed as an ONNX or C2 proto to Glow, loaded/compiled/quantized/etc., and run on one of our backends.

We also have a very basic PR up here (has not yet landed) that creates an actual Glow backend for PyTorch, which goes more directly from PyTorch IR to Glow.

Marat · June 19, 2019, 12:09pm

Building GLOW (https://github.com/pytorch/glow) is extreme pain which also includes additional pain of building custom llvm (https://solarianprogrammer.com/2013/01/17/building-clang-libcpp-ubuntu-linux/). I am still trying to build GLOW first I find issue with libpng&zlib next I have problem with protobuf. Are your suggestion simple?

jfix · June 19, 2019, 4:11pm

I’m sorry you’ve had such a bad experience. If you have feedback we’d happily take it to improve developer experience.

If you’re on Ubuntu 18.04 (like the link you posted says), I believe you should be able to use apt-get to install all of our dependencies. Have you followed the instructions on our README? It says it’s been tested on 16.04 at least. Or are you building everything manually?

The instructions for installing and using ONNXIFI are on that page, but I haven’t tried it myself. That page does say it can be a little tricky but I do not know the details there. If you try and run into issues you can always reach out for help.

Marat · June 20, 2019, 9:34am

I have the following issue with protobuf

lib/Importer/libImporter.a(caffe2.pb.cc.o):(.data.rel.ro+0xaf0): undefined reference to `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const'
lib/Importer/libImporter.a(caffe2.pb.cc.o):(.data.rel.ro+0xb90): undefined reference to `google::protobuf::Message::GetTypeName[abi:cxx11]() const'
lib/Importer/libImporter.a(caffe2.pb.cc.o):(.data.rel.ro+0xbc8): undefined reference to `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const'
lib/Importer/libImporter.a(caffe2.pb.cc.o):(.data.rel.ro+0xc68): undefined reference to `google::protobuf::Message::GetTypeName[abi:cxx11]() const'
lib/Importer/libImporter.a(caffe2.pb.cc.o):(.data.rel.ro+0xca0): undefined reference to `google::protobuf::Message::InitializationErrorString[abi:cxx11]() const'
clang-8: error: linker command failed with exit code 1 (use -v to see invocation)
[214/282] Linking CXX executable bin/char-rnn
ninja: build stopped: subcommand failed.

And I have following version of protobuf it is 3.6.1 which is >= 2.6.1

marat@moon:~/glow/build_Debug$ which protoc
/home/marat/anaconda3/bin/protoc
marat@moon:~/glow/build_Debug$ protoc --version
libprotoc 3.6.1
marat@moon:~/glow/build_Debug$

I also will be very appreciated if you say where did you take llvm-8 because I failed to find it by apt in my ubuntu 16.04

UPDATE

I removed anaconda form $PATH and it helped. But I do not know how it will work: pytorch with version of 3.6.1 and glow compiled with 2.6.1. So it seems that problem on compilation stage accrued due to libraries versions mismatch .

Marat · June 20, 2019, 11:40am

I also trying to build GLOW with

cmake -G Ninja -DCMAKE_BUILD_TYPE=Release -DGLOW_WITH_CPU=1 -DGLOW_WITH_OPENCL=1 ..

But recently have some problems

CMake Error at lib/Backends/OpenCL/CMakeLists.txt:32 (add_library):
  Target "OpenCLBackend" links to target "OpenCL::OpenCL" but the target was
  not found.  Perhaps a find_package() call is missing for an IMPORTED
  target, or an ALIAS target is missing?

That I supposed to do?

I believe that problem come from

target_link_libraries(OpenCLBackend
                      PUBLIC
                      OpenCL::OpenCL)

from

lib/Backends/OpenCL/CMakeLists.txt

find_package finds OpenCL successfully (no error messages about that)

That is “OpenCL::OpenCL” ?

UPDATE

This issue solved by using more recent cmake, cmake 3.5.2 DID NOT WORK please update cmake minimal required

jfix · June 21, 2019, 11:23pm

Thanks for the feedback! Would you mind posting these problems as GH issues via this link? We will be sure to fix them.