Finally, I solved the problem:
Its caused by _GLIBCXX_USE_CXX11_ABI=1 when compile pytorch from source. That means the c++ std::string abi doesn’t match between building pytorch source and building cpp extensions.
There are two way to solve this problem:
build cpp extensions with -D_GLIBCXX_USE_CXX11_ABI=1.
build pytorch with -D_GLIBCXX_USE_CXX11_ABI=0.
Below shows how I figure it out:
first I checked the newly installed v1.0.0 pytorch .so files in ~/anaconda3/lib/python3.6/site-packages/torch, which is my pytorch path.
U _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
000000000071c4e0 T _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
000000000071c4f0 T _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
000000000071be60 T _ZN5torch3jit6tracer27defaultRecordSourceLocationEPNS0_4NodeE
000000000060ce30 W _ZNSt14_Function_base13_Base_managerIZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS2_14SourceLocationEEN3c108ArrayRefIPNS2_5ValueEEENS8_IhEESB_EUlRSt6vectorINS7_6IValueESaISE_EEE_E10_M_ma
nagerERSt9_Any_dataRKSK_St18_Manager_operation
000000000060ce20 W _ZNSt17_Function_handlerIFiRSt6vectorIN3c106IValueESaIS2_EEEZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS8_14SourceLocationEENS1_8ArrayRefIPNS8_5ValueEEENSD_IhEESG_EUlS5_E_E9_M_invokeERK
St9_Any_dataS5_
0000000000d9aa30 V _ZTIN5torch3jit14SourceLocationE
0000000000da23a0 V _ZTIZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS0_14SourceLocationEEN3c108ArrayRefIPNS0_5ValueEEENS6_IhEES9_EUlRSt6vectorINS5_6IValueESaISC_EEE_
000000000097e5a0 V _ZTSN5torch3jit14SourceLocationE
00000000009ccaa0 V _ZTSZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS0_14SourceLocationEEN3c108ArrayRefIPNS0_5ValueEEENS6_IhEES9_EUlRSt6vectorINS5_6IValueESaISC_EEE_
0000000000da1b58 V _ZTVN5torch3jit14SourceLocationE
U _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN3c107Warning19set_warning_handlerEPFvRKNS_14SourceLocationEPKcE
U _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
U _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
U _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
000000000036f230 T _ZN5torch3jit6tracer26pythonRecordSourceLocationEPNS0_4NodeE
0000000000374000 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv
0000000000373f30 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv
0000000000374070 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE14_M_get_deleterERKSt9type_info
0000000000373ff0 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED0Ev
0000000000373f20 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED1Ev
0000000000373f20 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED2Ev
00000000009ec4b8 V _ZTIN5torch3jit14SourceLocationE
00000000009ef470 V _ZTIN5torch3jit20StringSourceLocationE
00000000009ef4d8 V _ZTISt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
0000000000686700 V _ZTSN5torch3jit14SourceLocationE
00000000006a3a00 V _ZTSN5torch3jit20StringSourceLocationE
00000000006a3be0 V _ZTSSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
00000000009efd40 V _ZTVN5torch3jit14SourceLocationE
00000000009ef508 V _ZTVN5torch3jit20StringSourceLocationE
00000000009ef568 V _ZTVSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
0000000000011f40 T _ZN3c105ErrorC1ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
0000000000011f40 T _ZN3c105ErrorC2ENS_14SourceLocationERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
000000000000ff00 T _ZN3c107Warning13print_warningERKNS_14SourceLocationEPKc
0000000000011060 T _ZN3c107Warning19set_warning_handlerEPFvRKNS_14SourceLocationEPKcE
0000000000011040 T _ZN3c107Warning4warnENS_14SourceLocationENSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
See those __cxx11 symbols, they are CXX11_ABI.
2. To make sure that its indeed CXX11_ABI problem, I checked the early version, like v0.4.1; ran the same command:
U _ZN3c105ErrorC1ENS_14SourceLocationERKSs
U _ZN3c105ErrorC1ENS_14SourceLocationERKSs
U _ZN3c107Warning4warnENS_14SourceLocationESs
000000000077f460 T _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
000000000077f470 T _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
000000000077f1b0 T _ZN5torch3jit6tracer27defaultRecordSourceLocationEPNS0_4NodeE
00000000006822a0 W _ZNSt14_Function_base13_Base_managerIZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS2_14SourceLocationEEN3c108ArrayRefIPNS2_5ValueEEENS8_IhEESB_EUlRSt6vectorINS7_6IValueESaISE_EEE_E10_M_managerERSt9_Any_dataRKSK_St18_Manager_operation
0000000000682080 W _ZNSt17_Function_handlerIFiRSt6vectorIN3c106IValueESaIS2_EEEZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS8_14SourceLocationEENS1_8ArrayRefIPNS8_5ValueEEENSD_IhEESG_EUlS5_E_E9_M_invokeERKSt9_Any_dataS5_
0000000000de27d0 V _ZTIN5torch3jit14SourceLocationE
0000000000dea930 V _ZTIZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS0_14SourceLocationEEN3c108ArrayRefIPNS0_5ValueEEENS6_IhEES9_EUlRSt6vectorINS5_6IValueESaISC_EEE_
000000000099a840 V _ZTSN5torch3jit14SourceLocationE
00000000009f6bc0 V _ZTSZN5torch3jit8CodeImpl12insertAssignESt10shared_ptrINS0_14SourceLocationEEN3c108ArrayRefIPNS0_5ValueEEENS6_IhEES9_EUlRSt6vectorINS5_6IValueESaISC_EEE_
0000000000de2a80 V _ZTVN5torch3jit14SourceLocationE
U _ZN3c105ErrorC1ENS_14SourceLocationERKSs
U _ZN3c107Warning4warnENS_14SourceLocationESs
U _ZN3c105ErrorC1ENS_14SourceLocationERKSs
U _ZN3c107Warning4warnENS_14SourceLocationESs
U _ZN3c105ErrorC1ENS_14SourceLocationERKSs
U _ZN3c107Warning19set_warning_handlerEPFvRKNS_14SourceLocationEPKcE
U _ZN3c107Warning4warnENS_14SourceLocationESs
U _ZN5torch3jit6tracer20recordSourceLocationEPNS0_4NodeE
U _ZN5torch3jit6tracer23setRecordSourceLocationEPFvPNS0_4NodeEE
000000000039c3d0 T _ZN5torch3jit6tracer26pythonRecordSourceLocationEPNS0_4NodeE
00000000003a0210 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_destroyEv
00000000003a01d0 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv
00000000003a03d0 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE14_M_get_deleterERKSt9type_info
00000000003a0220 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED0Ev
00000000003a01c0 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED1Ev
00000000003a01c0 W _ZNSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EED2Ev
0000000000a0bdb0 V _ZTIN5torch3jit14SourceLocationE
0000000000a0f750 V _ZTIN5torch3jit20StringSourceLocationE
0000000000a0f7d0 V _ZTISt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
00000000006a0bc0 V _ZTSN5torch3jit14SourceLocationE
00000000006be5c0 V _ZTSN5torch3jit20StringSourceLocationE
00000000006be7c0 V _ZTSSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
0000000000a0bf00 V _ZTVN5torch3jit14SourceLocationE
0000000000a0f820 V _ZTVN5torch3jit20StringSourceLocationE
0000000000a0f8a0 V _ZTVSt23_Sp_counted_ptr_inplaceIN5torch3jit20StringSourceLocationESaIS2_ELN9__gnu_cxx12_Lock_policyE2EE
0000000000012dd0 T _ZN3c105ErrorC1ENS_14SourceLocationERKSs
0000000000012dd0 T _ZN3c105ErrorC2ENS_14SourceLocationERKSs
00000000000108c0 T _ZN3c107Warning13print_warningERKNS_14SourceLocationEPKc
0000000000010cb0 T _ZN3c107Warning19set_warning_handlerEPFvRKNS_14SourceLocationEPKcE
0000000000010c90 T _ZN3c107Warning4warnENS_14SourceLocationESs
See, it does contain the symbol _ZN3c105ErrorC1ENS_14SourceLocationERKSs.
Then I rebuilt pytorch from source with export CFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 $CFLAGS".
The best way to solve this problem in any case is to compile Pytorch from source and use that same compiler for the extension. Then all problems go away.
And it’s not easy for every one (including me) to compile pytorch from source code.
So I run this conda install pytorch torchvision cudatoolkit=10.0 -c pytorch
to download from pytorch source, instead of run this: conda install pytorch torchvision cudatoolkit=10.0
The former one has the torch._C._GLIBCXX_USE_CXX11_ABI = False but the latter one is True.
I build my pytorch from source, and have set the _GLIBCXX_USE_CXX11_ABI to be False. Then I built the extension, but when running it, this error occurs:
Traceback (most recent call last):
File "/ssd2/exec/xiaoyunlong/code/retinanet-examples/retinanet/main.py", line 10, in <module>
from retinanet import infer, train, utils
File "/ssd2/exec/xiaoyunlong/anaconda3/lib/python3.7/site-packages/retinanet/infer.py", line 13, in <module>
from .model import Model
File "/ssd2/exec/xiaoyunlong/anaconda3/lib/python3.7/site-packages/retinanet/model.py", line 8, in <module>
from ._C import Engine
ImportError: /ssd2/exec/xiaoyunlong/anaconda3/lib/python3.7/site-packages/retinanet/_C.so: undefined symbol: _ZN2cv8fastFreeEPv
I have checked the _GLIBCXX_USE_CXX11_ABI by torch._C._GLIBCXX_USE_CXX11_ABI and it outputs False.
@IceSuger_ZN2cv8fastFreeEPv (a.k.a. cv::fastFree(void*)) is a symbol from opencv. Did you check that opencv is correctly linked? And is opencv compiled using the same compiler as pytorch?
Thank you for your quick reply!
After a whole day trying to compile and install opencv but failed, I tried to use conda install for pytorch, opencv and pip install for the package retinanet(https://github.com/NVIDIA/retinanet-examples, which contains an pytorch c++ extension).
But the result is the same, undefined symbol: _ZN2cv8fastFreeEPv.
So, I guess, the pytorch, opencv and the extension must all be compiled from source with gcc of the same version?
Also, I am wondering how to check whether my opencv is correctly linked? (When I import cv2 in the python installed with anaconda3, no error occurs.)
Maybe late but I hope it will help others to solve.
If you used pytorch-1.0.x, _GLIBCXX_USE_CXX11_ABI will be automatically set 0, please check here. Even though you export another _GLIBCXX_USE_CXX11_ABI in the shell or add extra compile argument in setup.py, these will be overridden. While in pytorch-1.1, _GLIBCXX_USE_CXX11_ABI is set to be as same as torch._C._GLIBCXX_USE_CXX11_ABI, please check here