Linker errors when building PyTorch in NGC container

My goal is build a PyTorch package for Python 3.9 inside Nvidia’s NGC container nvcr.io/nvidia/pytorch:23.02-py3 (PyTorch Release 23.02 - NVIDIA Docs). The binaries that come with the container are for Python 3.8 only.

I found an example Dockerfile /workspace/docker-examples/Dockerfile.custompytorch inside the container. I installed python3.9 interactively and used a command line lifted from that Dockerfile to compile the PyTorch sources shipped with the container:

root@73e2508e7681:/opt/pytorch# apt update
root@73e2508e7681:/opt/pytorch# DEBIAN_FRONTEND=noninteractive apt install python3.9 python3.9-dev python3.9-venv
root@73e2508e7681:/opt/pytorch# curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
root@73e2508e7681:/opt/pytorch# python3.9 get-pip.py

root@73e2508e7681:/opt/pytorch/pytorch# CUDA_HOME="/usr/local/cuda" CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" NCCL_INCLUDE_DIR="/usr/include/" NCCL_LIB_DIR="/usr/lib/" USE_SYSTEM_NCCL=1 USE_OPENCV=1 python3.9 -m pip install --no-cache-dir -v .

Note that the example Dockerfile appears to be somewhat outdated as conda doesn’t come with the container.

The build fails for me with linker errors related to symbols in at::cuda::detail:

 [4883/6964] Linking CXX executable bin/c10_InlineDeviceGuard_test
  FAILED: bin/c10_InlineDeviceGuard_test
  : && /bin/c++ -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow -O3 -DNDEBUG -DNDEBUG -rdynamic -Wl,-rpath -Wl,/opt/hpcx/ompi/lib -Wl,--enable-new-dtags -pthread -Wl,-rpath-link,/usr/lib/x86_64-linux-gnu c10/test/CMakeFiles/c10_InlineDeviceGuard_test.dir/core/impl/InlineDeviceGuard_test.cpp.o -o bin/c10_InlineDeviceGuard_test  -Wl,-rpath,/opt/pytorch/pytorch/build/lib:  lib/libc10.so  lib/libgmock.a  lib/libgtest.a  lib/libgtest_main.a  lib/libgtest.a  -pthread && :
  /usr/bin/ld: c10/test/CMakeFiles/c10_InlineDeviceGuard_test.dir/core/impl/InlineDeviceGuard_test.cpp.o: in function `InlineDeviceGuard_Constructor_Test::TestBody()':
  InlineDeviceGuard_test.cpp:(.text+0x14c1): undefined reference to `at::cuda::detail::hasPrimaryContext(long)'
  /usr/bin/ld: InlineDeviceGuard_test.cpp:(.text+0x14e1): undefined reference to `at::cuda::detail::hasPrimaryContext(long)'
  /usr/bin/ld: c10/test/CMakeFiles/c10_InlineDeviceGuard_test.dir/core/impl/InlineDeviceGuard_test.cpp.o: in function `InlineDeviceGuard_SetDevice_Test::TestBody()':
  InlineDeviceGuard_test.cpp:(.text+0x1912): undefined reference to `at::cuda::detail::hasPrimaryContext(long)'
  /usr/bin/ld: InlineDeviceGuard_test.cpp:(.text+0x19ac): undefined reference to `at::cuda::detail::hasPrimaryContext(long)'
  /usr/bin/ld: c10/test/CMakeFiles/c10_InlineDeviceGuard_test.dir/core/impl/InlineDeviceGuard_test.cpp.o: in function `InlineDeviceGuard_ResetDevice_Test::TestBody()':
  InlineDeviceGuard_test.cpp:(.text+0x1f42): undefined reference to `at::cuda::detail::hasPrimaryContext(long)'
  /usr/bin/ld: c10/test/CMakeFiles/c10_InlineDeviceGuard_test.dir/core/impl/InlineDeviceGuard_test.cpp.o:InlineDeviceGuard_test.cpp:(.text+0x1fdc): more undefined references to `at::cuda::detail::hasPrimaryContext(long)' follow
  collect2: error: ld returned 1 exit status

How was the torch package built that comes with the container? Is it necessary to set up a conda environment?

You might need to use BUILD_TEST=0 during your build.

Thanks for the quick reply! With BUILD_TEST=0 the build completes indeed and produces a wheel.

root@73e2508e7681:/opt/pytorch/pytorch# CUDA_HOME="/usr/local/cuda" CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" NCCL_INCLUDE_DIR="/usr/include/" NCCL_LIB_DIR="/usr/lib/" USE_SYSTEM_NCCL=1 USE_OPENCV=1 BUILD_TEST=0 python3.9 -m pip install --no-cache-dir -v .

However, I get a segmentation fault on import.

root@73e2508e7681:~# python3.9 -m pip list | grep torch
torch             1.14.0a0+44dac51
root@73e2508e7681:~# python3.9 -c 'import torch'
Segmentation fault

Can’t see more in the debugger unfortunately:

root@73e2508e7681:~# gdb --args python3.9 -c 'import torch'
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3.9...
(No debugging symbols found in python3.9)
(gdb) run
Starting program: /usr/bin/python3.9 -c import\ torch
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
0x00000000005dc0e4 in ?? ()

I don’t know what’s causing the segfault without seeing any backtrace.
Which Python 3.9 feature do you need that you want to rebuild PyTorch inside this container?

I got an extra sliver of information by installing a python interpreter with debug symbols

root@73e2508e7681:~# apt install python3.9-dbg
root@73e2508e7681:~# gdb --args python3.9-dbg -c 'import torch'
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3.9-dbg...
(gdb) run
Starting program: /usr/bin/python3.9-dbg -c import\ torch
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
_Py_INCREF (op=0x0) at ../Include/object.h:408
408     ../Include/object.h: No such file or directory.

So it’s related to reference counting, but that’s still pretty generic.

Which Python 3.9 feature do you need that you want to rebuild PyTorch inside this container?

This is for a large, existing Python code base. Apart from new features related to type annotations in Python 3.9 it probably also depends on some changes in the standard library.

My goal was to get a PyTorch binary with good support for CUDA 12 and Hopper GPUs.

Sorry, but I don’t have a clue what might be causing it. Did you install Python==3.9 via apt or did you pull conda into the container to create a new env?

This is with the python3.9 package from apt (Python version 3.9.5). I just used the global root environment for that interpreter and didn’t pull in any conda at all.

I wasn’t thinking quite straight yesterday, should have immediately let gdb print the backtrace. Here it is:

root@73e2508e7681:~# gdb --args python3.9-dbg -c 'import torch'
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04.1) 9.2
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python3.9-dbg...
(gdb) run
Starting program: /usr/bin/python3.9-dbg -c import\ torch
warning: Error disabling address space randomization: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

Program received signal SIGSEGV, Segmentation fault.
_Py_INCREF (op=0x0) at ../Include/object.h:408
408     ../Include/object.h: No such file or directory.
(gdb) set pagination off
(gdb) bt
#0  _Py_INCREF (op=0x0) at ../Include/object.h:408
#1  type_qualname (type=0x147ebd0, context=0x0) at ../Objects/typeobject.c:479
#2  0x000000000065b2fb in getset_get (descr=descr@entry=0x7f192489fd10, obj=obj@entry=0x147ebd0, type=type@entry=0x45ebc90) at ../Objects/descrobject.c:185
#3  0x0000000000491d14 in type_getattro (type=0x147ebd0, name=0x7f18794f2b80) at ../Objects/typeobject.c:3345
#4  0x000000000046b155 in PyObject_GetAttr (v=v@entry=0x147ebd0, name=name@entry=0x7f18794f2b80) at ../Objects/object.c:890
#5  0x000000000046ce4a in PyObject_GetAttrString (v=0x147ebd0, name=<optimized out>) at ../Objects/object.c:795
#6  0x00007f18dbb487bd in pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>::get_cache() const () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#7  0x00007f18dbb4a9c1 in pybind11::cpp_function::initialize_generic(std::unique_ptr<pybind11::detail::function_record, pybind11::cpp_function::InitializingFunctionRecordDeleter>&&, char const*, std::type_info const* const*, unsigned long) () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#8  0x00007f18dc2a25bc in pybind11::enum_<onnx_torch::TensorProto_DataType>::enum_<>(pybind11::handle const&, char const*) () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#9  0x00007f18dc29cdee in torch::onnx::initONNXBindings(_object*) () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#10 0x00007f18dbe479f4 in initModule () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so
#11 0x000000000050f245 in _PyImport_LoadDynamicModuleWithSpec (spec=spec@entry=0x7f1924313910, fp=fp@entry=0x0) at ../Python/importdl.c:164
#12 0x000000000050cf8b in _imp_create_dynamic_impl (module=module@entry=0x7f1924819b90, spec=0x7f1924313910, file=<optimized out>) at ../Python/import.c:2297
#13 0x000000000050d11d in _imp_create_dynamic (module=0x7f1924819b90, args=0x7f1924313568, nargs=1) at ../Python/clinic/import.c.h:330
#14 0x000000000066a8fa in cfunction_vectorcall_FASTCALL (func=0x7f1924822b90, args=0x7f1924313568, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/methodobject.c:426
#15 0x000000000043302e in PyVectorcall_Call (callable=callable@entry=0x7f1924822b90, tuple=tuple@entry=0x7f1924313550, kwargs=kwargs@entry=0x7f1924311d10) at ../Include/object.h:630
#16 0x0000000000433316 in _PyObject_Call (tstate=0x13d29c0, callable=callable@entry=0x7f1924822b90, args=args@entry=0x7f1924313550, kwargs=kwargs@entry=0x7f1924311d10) at ../Objects/call.c:266
#17 0x000000000043338a in PyObject_Call (callable=callable@entry=0x7f1924822b90, args=args@entry=0x7f1924313550, kwargs=kwargs@entry=0x7f1924311d10) at ../Objects/call.c:293
#18 0x00000000004dfc98 in do_call_core (tstate=tstate@entry=0x13d29c0, func=func@entry=0x7f1924822b90, callargs=callargs@entry=0x7f1924313550, kwdict=kwdict@entry=0x7f1924311d10) at ../Python/ceval.c:5092
#19 0x00000000004ec2d1 in _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x14ef110, throwflag=<optimized out>) at ../Python/ceval.c:3580
#20 0x00000000004ee225 in _PyEval_EvalFrame (throwflag=0, f=0x14ef110, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#21 _PyEval_EvalCode (tstate=0x13d29c0, _co=0x7f19248865f0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=2, kwnames=0x0, kwargs=0x14a7b70, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f192486dd60, qualname=0x7f192486dd60) at ../Python/ceval.c:4327
#22 0x00000000004334db in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Include/object.h:630
#23 0x00000000004eb570 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x14a7b60, callable=0x7f192481f410, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#24 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x14a7b60, callable=0x7f192481f410) at ../Include/cpython/abstract.h:127
#25 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#26 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x14a79d0, throwflag=<optimized out>) at ../Python/ceval.c:3487
#27 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x14a79d0, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#28 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7f19245ee8e0, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#29 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#30 0x00000000004eb7ef in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x7f19245ee8d0, callable=0x7f19247f5e10, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#31 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x7f19245ee8d0, callable=0x7f19247f5e10) at ../Include/cpython/abstract.h:127
#32 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#33 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f19245ee750, throwflag=<optimized out>) at ../Python/ceval.c:3504
#34 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x7f19245ee750, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#35 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7f1924814bd8, nargs=1, globals=<optimized out>) at ../Objects/call.c:330
#36 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#37 0x00000000004eba55 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7f1924814bd0, callable=0x7f192481feb0, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#38 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x7f1924814bd0, callable=0x7f192481feb0) at ../Include/cpython/abstract.h:127
#39 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#40 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f1924814a50, throwflag=<optimized out>) at ../Python/ceval.c:3518
#41 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x7f1924814a50, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#42 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x14d7ab0, nargs=1, globals=<optimized out>) at ../Objects/call.c:330
#43 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#44 0x00000000004eba55 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x14d7aa8, callable=0x7f1924820190, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#45 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x14d7aa8, callable=0x7f1924820190) at ../Include/cpython/abstract.h:127
#46 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#47 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x14d78f0, throwflag=<optimized out>) at ../Python/ceval.c:3518
#48 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x14d78f0, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#49 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x14a7518, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#50 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#51 0x00000000004eba55 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x14a7508, callable=0x7f1924823550, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#52 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x14a7508, callable=0x7f1924823550) at ../Include/cpython/abstract.h:127
#53 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#54 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x14a7370, throwflag=<optimized out>) at ../Python/ceval.c:3518
#55 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x14a7370, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#56 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7ffc4d2e36e0, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#57 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#58 0x0000000000434104 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7ffc4d2e36d0, callable=0x7f19248235f0, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#59 object_vacall (tstate=tstate@entry=0x13d29c0, base=base@entry=0x0, callable=0x7f19248235f0, vargs=vargs@entry=0x7ffc4d2e3748) at ../Objects/call.c:792
#60 0x0000000000434407 in _PyObject_CallMethodIdObjArgs (obj=0x0, name=name@entry=0x98a960 <PyId__find_and_load.17741>) at ../Objects/call.c:883
#61 0x0000000000509cf7 in import_find_and_load (tstate=tstate@entry=0x13d29c0, abs_name=abs_name@entry=0x7f1924730160) at ../Python/import.c:1771
#62 0x000000000050e0f1 in PyImport_ImportModuleLevelObject (name=name@entry=0x7f1924730160, globals=<optimized out>, locals=<optimized out>, fromlist=fromlist@entry=0x7f192471f190, level=0) at ../Python/import.c:1872
#63 0x00000000004dc88b in import_name (tstate=tstate@entry=0x13d29c0, f=f@entry=0x1424a20, name=name@entry=0x7f1924730160, fromlist=fromlist@entry=0x7f192471f190, level=level@entry=0x7f19248d3280) at ../Python/ceval.c:5193
#64 0x00000000004e955f in _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x1424a20, throwflag=<optimized out>) at ../Python/ceval.c:3097
#65 0x00000000004ee225 in _PyEval_EvalFrame (throwflag=0, f=0x1424a20, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#66 _PyEval_EvalCode (tstate=0x13d29c0, _co=0x7f1924729a00, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4327
#67 0x00000000004ee393 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4359
#68 0x00000000004ee3d6 in PyEval_EvalCodeEx (_co=_co@entry=0x7f1924729a00, globals=globals@entry=0x7f192470f290, locals=locals@entry=0x7f192470f290, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at ../Python/ceval.c:4375
#69 0x00000000004ee408 in PyEval_EvalCode (co=co@entry=0x7f1924729a00, globals=globals@entry=0x7f192470f290, locals=locals@entry=0x7f192470f290) at ../Python/ceval.c:826
#70 0x000000000069d3a2 in builtin_exec_impl (module=module@entry=0x7f192487db90, source=0x7f1924729a00, globals=0x7f192470f290, locals=0x7f192470f290) at ../Python/bltinmodule.c:1035
#71 0x000000000069d4c4 in builtin_exec (module=0x7f192487db90, args=0x7f1924731338, nargs=2) at ../Python/clinic/bltinmodule.c.h:396
#72 0x000000000066a8fa in cfunction_vectorcall_FASTCALL (func=0x7f192487e3b0, args=0x7f1924731338, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/methodobject.c:426
#73 0x000000000043302e in PyVectorcall_Call (callable=callable@entry=0x7f192487e3b0, tuple=tuple@entry=0x7f1924731320, kwargs=kwargs@entry=0x7f192470f2f0) at ../Include/object.h:630
#74 0x0000000000433316 in _PyObject_Call (tstate=0x13d29c0, callable=callable@entry=0x7f192487e3b0, args=args@entry=0x7f1924731320, kwargs=kwargs@entry=0x7f192470f2f0) at ../Objects/call.c:266
#75 0x000000000043338a in PyObject_Call (callable=callable@entry=0x7f192487e3b0, args=args@entry=0x7f1924731320, kwargs=kwargs@entry=0x7f192470f2f0) at ../Objects/call.c:293
#76 0x00000000004dfc98 in do_call_core (tstate=tstate@entry=0x13d29c0, func=func@entry=0x7f192487e3b0, callargs=callargs@entry=0x7f1924731320, kwdict=kwdict@entry=0x7f192470f2f0) at ../Python/ceval.c:5092
#77 0x00000000004ec2d1 in _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f19246fab30, throwflag=<optimized out>) at ../Python/ceval.c:3580
#78 0x00000000004ee225 in _PyEval_EvalFrame (throwflag=0, f=0x7f19246fab30, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#79 _PyEval_EvalCode (tstate=0x13d29c0, _co=0x7f19248865f0, globals=<optimized out>, locals=locals@entry=0x0, args=<optimized out>, argcount=3, kwnames=0x0, kwargs=0x7f1924791798, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f192486dd60, qualname=0x7f192486dd60) at ../Python/ceval.c:4327
#80 0x00000000004334db in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Include/object.h:630
#81 0x00000000004eb570 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775811, args=0x7f1924791780, callable=0x7f192481f410, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#82 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775811, args=0x7f1924791780, callable=0x7f192481f410) at ../Include/cpython/abstract.h:127
#83 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#84 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f19247915f0, throwflag=<optimized out>) at ../Python/ceval.c:3487
#85 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x7f19247915f0, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#86 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7f192475e7e0, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#87 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#88 0x00000000004eb7ef in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x7f192475e7d0, callable=0x7f19247f0a50, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#89 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x7f192475e7d0, callable=0x7f19247f0a50) at ../Include/cpython/abstract.h:127
#90 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#91 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f192475e650, throwflag=<optimized out>) at ../Python/ceval.c:3504
#92 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x7f192475e650, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#93 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x1414470, nargs=1, globals=<optimized out>) at ../Objects/call.c:330
#94 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#95 0x00000000004eba55 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x1414468, callable=0x7f1924820190, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#96 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775809, args=0x1414468, callable=0x7f1924820190) at ../Include/cpython/abstract.h:127
#97 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#98 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x14142b0, throwflag=<optimized out>) at ../Python/ceval.c:3518
#99 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x14142b0, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#100 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7f19247151f8, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#101 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#102 0x00000000004eba55 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775810, args=0x7f19247151e8, callable=0x7f1924823550, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#103 PyObject_Vectorcall (kwnames=0x0, nargsf=9223372036854775810, args=0x7f19247151e8, callable=0x7f1924823550) at ../Include/cpython/abstract.h:127
#104 call_function (kwnames=0x0, oparg=<optimized out>, pp_stack=<synthetic pointer>, tstate=0x13d29c0) at ../Python/ceval.c:5072
#105 _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f1924715050, throwflag=<optimized out>) at ../Python/ceval.c:3518
#106 0x0000000000432d15 in _PyEval_EvalFrame (throwflag=0, f=0x7f1924715050, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#107 function_code_fastcall (tstate=0x13d29c0, co=<optimized out>, args=0x7ffc4d2e4610, nargs=2, globals=<optimized out>) at ../Objects/call.c:330
#108 0x00000000004335d0 in _PyFunction_Vectorcall (func=<optimized out>, stack=<optimized out>, nargsf=<optimized out>, kwnames=<optimized out>) at ../Objects/call.c:367
#109 0x0000000000434104 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=2, args=0x7ffc4d2e4600, callable=0x7f19248235f0, tstate=0x13d29c0) at ../Include/cpython/abstract.h:118
#110 object_vacall (tstate=tstate@entry=0x13d29c0, base=base@entry=0x0, callable=0x7f19248235f0, vargs=vargs@entry=0x7ffc4d2e4678) at ../Objects/call.c:792
#111 0x0000000000434407 in _PyObject_CallMethodIdObjArgs (obj=0x0, name=name@entry=0x98a960 <PyId__find_and_load.17741>) at ../Objects/call.c:883
#112 0x0000000000509cf7 in import_find_and_load (tstate=tstate@entry=0x13d29c0, abs_name=abs_name@entry=0x7f19247c96d0) at ../Python/import.c:1771
#113 0x000000000050e0f1 in PyImport_ImportModuleLevelObject (name=name@entry=0x7f19247c96d0, globals=<optimized out>, locals=<optimized out>, fromlist=fromlist@entry=0x982bc0 <_Py_NoneStruct>, level=0) at ../Python/import.c:1872
#114 0x00000000004dc88b in import_name (tstate=tstate@entry=0x13d29c0, f=f@entry=0x7f1924814850, name=name@entry=0x7f19247c96d0, fromlist=fromlist@entry=0x982bc0 <_Py_NoneStruct>, level=level@entry=0x7f19248d3280) at ../Python/ceval.c:5193
#115 0x00000000004e955f in _PyEval_EvalFrameDefault (tstate=0x13d29c0, f=0x7f1924814850, throwflag=<optimized out>) at ../Python/ceval.c:3097
#116 0x00000000004ee225 in _PyEval_EvalFrame (throwflag=0, f=0x7f1924814850, tstate=0x13d29c0) at ../Include/internal/pycore_ceval.h:40
#117 _PyEval_EvalCode (tstate=0x13d29c0, _co=0x7f19247d8450, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=0x0, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4327
#118 0x00000000004ee393 in _PyEval_EvalCodeWithName (_co=<optimized out>, globals=<optimized out>, locals=<optimized out>, args=<optimized out>, argcount=argcount@entry=0, kwnames=<optimized out>, kwargs=0x0, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x0, qualname=0x0) at ../Python/ceval.c:4359
#119 0x00000000004ee3d6 in PyEval_EvalCodeEx (_co=_co@entry=0x7f19247d8450, globals=globals@entry=0x7f19247cf410, locals=locals@entry=0x7f19247cf410, args=args@entry=0x0, argcount=argcount@entry=0, kws=kws@entry=0x0, kwcount=0, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0) at ../Python/ceval.c:4375
#120 0x00000000004ee408 in PyEval_EvalCode (co=co@entry=0x7f19247d8450, globals=globals@entry=0x7f19247cf410, locals=locals@entry=0x7f19247cf410) at ../Python/ceval.c:826
#121 0x000000000052796d in run_eval_code_obj (tstate=tstate@entry=0x13d29c0, co=co@entry=0x7f19247d8450, globals=globals@entry=0x7f19247cf410, locals=locals@entry=0x7f19247cf410) at ../Python/pythonrun.c:1219
#122 0x0000000000527df3 in run_mod (mod=mod@entry=0x1445958, filename=filename@entry=0x7f192470f0a0, globals=globals@entry=0x7f19247cf410, locals=locals@entry=0x7f19247cf410, flags=flags@entry=0x7ffc4d2e4c28, arena=arena@entry=0x7f19247cf760) at ../Python/pythonrun.c:1240
#123 0x000000000052a07a in PyRun_StringFlags (str=str@entry=0x7f19247c5fb0 "import torch\n", start=start@entry=257, globals=0x7f19247cf410, locals=0x7f19247cf410, flags=flags@entry=0x7ffc4d2e4c28) at ../Python/pythonrun.c:1106
#124 0x000000000052a101 in PyRun_SimpleStringFlags (command=0x7f19247c5fb0 "import torch\n", flags=flags@entry=0x7ffc4d2e4c28) at ../Python/pythonrun.c:496
#125 0x0000000000423181 in pymain_run_command (command=<optimized out>, cf=cf@entry=0x7ffc4d2e4c28) at ../Modules/main.c:246
#126 0x0000000000424052 in pymain_run_python (exitcode=exitcode@entry=0x7ffc4d2e4c5c) at ../Modules/main.c:589
#127 0x0000000000424209 in Py_RunMain () at ../Modules/main.c:677
#128 0x000000000042425e in pymain_main (args=args@entry=0x7ffc4d2e4ca0) at ../Modules/main.c:707
#129 0x00000000004242e2 in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at ../Modules/main.c:731
#130 0x0000000000422d63 in main (argc=<optimized out>, argv=<optimized out>) at ../Programs/python.c:15

And that explains what went wrong: Somehow Python 3.8 got mixed in there ( #10 0x00007f18dbe479f4 in initModule () from /usr/local/lib/python3.8/dist-packages/torch/lib/libtorch_python.so). I really need to separate the environments more cleanly.

OK, the actual problem was that the LD_LIBRARY_PATH environment variable is set to include Python 3.8 specific library paths in the container, which set up the Python 3.9 binary wrongly.

    (py39venv) root@73e2508e7681:/workspace# echo $LD_LIBRARY_PATH
    /usr/local/cuda/compat/lib.real:/usr/local/lib/python3.8/dist-packages/torch/lib:/usr/local/lib/python3.8/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda-11/lib64

If I remove those paths, the torch package works fine:

(py39venv) root@73e2508e7681:/workspace# LD_LIBRARY_PATH=/usr/local/cuda/compat/lib.real:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/usr/local/cuda-11/lib64 python -c 'import torch' ; echo $?
0

(This time I built a torch wheel and installed that into a separate py39 venv, but even before the environments were only commingled through that dynamic library path.)

So I have a working method now to produce wheels for newer versions of Python in containers derived from the NGC container. Thanks for the support @ptrblck!

1 Like

Great and thanks for the detailed update!