Pytorch 0.4.1: undefined symbol at import of a cpp_extension

bcharlier · September 3, 2018, 3:48pm

Dear community members,

Description of the problem:

I am writing and using a custom cpp extension using Pybind11 and Aten (python3 only). Everything was working well with pytorch<=0.4 (under Linux and MacOs with both gcc and clang). I use my own cmake routine to compile my extension.

Unfortunately, the last version of pytorch introduced a problem : the extension still compiles fine, but crash when I import the created module (which is in fact a shared object called libKeOpstorch6698ab2e06.cpython-36m-x86_64-linux-gnu.so). Under python 3 it gives:

import libKeOpstorch6698ab2e06
[...]
ImportError: [...]/pykeops/build/libKeOpstorch6698ab2e06.cpython-36m-x86_64-linux-gnu.so: 
undefined symbol: _ZN2at5ErrorC1ENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE

The error was reproduced on a debian testing (python3.6) and a Ubuntu16.04 LTS (python3.5).

Comments:

After de-mangling the missing symbol reads :

at::Error::Error(at::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)

Looking for a solution I end up reading the he file [...]/lib/python3.5/site-packages/torch/lib/include/ATen/Error.h. It has evolved a lot between 0.4 and 0.4.1. In the v0.4.1, this header contains a method at::Error::Error(SourceLocation source_location, std::string err) which is close to the undefined symbol…

I suspect an unfortunate cast between std::string and std::__cxx11::basic_string (maybe by pybind11) … But I am currently stuck with this.

I will appreciate any idea/comment.

Best,

b.

SimonW · September 4, 2018, 2:38am

cc @goldsborough who knows the best about c++ extension

tom · September 4, 2018, 7:41am

The __cxx11 superficially looks related to the C++ ABI switch in gcc 5. You could either try to compile both PyTorch from source and your extension with the same compiler or share which version you used for your extension.
If you have a very recent cuda, gcc 6 is a great choice, I have used gcc 5 successfully since PyTorch 0.1.2, but I always self-compiled so I didn’t have consistency issues.

Best regards

Thomas

bcharlier · September 4, 2018, 11:51am

Hi tom,

I have tested on Debian testing (no cuda) with g++5, 6 and 7, as well as with clang++ 6 and on Ubuntu 18.04 (with cuda) with g++5. The same symbol remains undefined.

I will try asap to rebuild v0.4.1 from source, to test if the symbol is recovered.

bcharlier · September 4, 2018, 3:12pm

Indeed, it works with the built version (branch: master commit: 0d5e4a2c6)… I’m not sure it is really a good news though. Asking an end-user to rebuild pytorch from source is, for our project, a bit too much.

I hope next python package will not suffer the same problem…

Anyway. Thank you for your kind and quick answer

tom · September 5, 2018, 8:34pm

At some point of time, PyTorch was compiled with g++ 4.8 or somesuch (before the C++ abi switch) and g++ 5,6,7 are all after (I don’t know whether you can use a switch for compatibility).
Personally, I’d hope that one could move to a newer gcc, but even the"manylinux 2010" seems to decree gcc 4 or something like that.

Best regards

Thomas