Libtorch crashes docker when included in header file

I’ve been working on integrating libtorch C++ into my project for a few weeks and while it works fine when I include the

#include <torch/script.h>
#include <torch/torch.h>

headers in a cpp source file it works fine. But when I include them in a cpp header file it results in my docker environment crashing and the following error log.

Log Screenshot

Log File

-- The CXX compiler identification is GNU 10.5.0
-- The CUDA compiler identification is NVIDIA 12.2.140
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
Sim Mode: Disabled
-- The C compiler identification is GNU 10.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
RoveComm Build Mode: 1
RoveComm_CPP -- LIBRARY MODE
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found suitable exact version "12.2") 
-- Found OpenCV: /usr/local (found version "4.9.0") 
-- Reading /usr/local/lib/cmake/GeographicLib/geographiclib-config.cmake
-- GeographicLib configuration, version 2.3
--   ${GeographicLib_LIBRARIES} set to shared library
-- Looking for sgemm_
-- Looking for sgemm_ - not found
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found BLAS: /usr/lib/x86_64-linux-gnu/libopenblas.so  
-- Found CUDA: /usr/local/cuda (found suitable version "12.2", minimum required is "12") 
-- Found CUDAToolkit: /usr/local/cuda/include (found suitable version "12.2.140", minimum required is "12") 
-- Found CUDA: /usr/local/cuda (found version "12.2") 
-- Found CUDAToolkit: /usr/local/cuda/include (found version "12.2.140") 
-- Caffe2: CUDA detected: 12.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda
-- Caffe2: Header version is: 12.2
-- /usr/local/cuda-12.2/targets/x86_64-linux/lib/libnvrtc.so shorthash is 000ca627
-- Found CUDNN: /usr/local/cuda/lib64/libcudnn.so  
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s):  8.6
-- Added CUDA NVCC flags for: -gencode;arch=compute_86,code=sm_86
-- Found Torch: /usr/lib/libtorch.so  
-- Found TENSORFLOWLITE: /usr/local/lib/libtensorflowlite.so  
-- Found LIBEDGETPU: /usr/local/lib/libedgetpu.so.1.0  
-- Configuring done (1.8s)
-- Generating done (0.0s)
-- Build files have been written to: /workspaces/Autonomy_Software/build
[  2%] Building CXX object external/rovecomm/CMakeFiles/RoveComm_CPP.dir/src/RoveComm/RoveCommGlobals.cpp.o
[  5%] Building CXX object external/rovecomm/CMakeFiles/RoveComm_CPP.dir/src/RoveComm/RoveCommPacket.cpp.o
[  7%] Building CXX object external/rovecomm/CMakeFiles/RoveComm_CPP.dir/src/RoveComm/RoveCommTCP.cpp.o
[ 10%] Building CXX object external/rovecomm/CMakeFiles/RoveComm_CPP.dir/src/RoveComm/RoveCommUDP.cpp.o
[ 13%] Linking CXX static library libRoveComm_CPP.a
[ 13%] Built target RoveComm_CPP
[ 15%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/AutonomyGlobals.cpp.o
[ 18%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/AutonomyLogging.cpp.o
[ 21%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/AutonomyNetworking.cpp.o
[ 23%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/algorithms/controllers/PIDController.cpp.o
[ 26%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/IdentitySoftware.cpp.o
[ 31%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/algorithms/planners/AStar.cpp.o
[ 31%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/algorithms/controllers/StanleyController.cpp.o
[ 34%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/drivers/MultimediaBoard.cpp.o
[ 36%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/drivers/DriveBoard.cpp.o
[ 39%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/drivers/NavigationBoard.cpp.o
[ 42%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/CameraHandler.cpp.o
[ 44%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/RecordingHandler.cpp.o
[ 47%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/ObjectDetectionHandler.cpp.o
[ 50%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/TagDetectionHandler.cpp.o
[ 52%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/StateMachineHandler.cpp.o
[ 55%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/handlers/WaypointHandler.cpp.o
[ 57%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/main.cpp.o
[ 60%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/ApproachingObjectState.cpp.o
[ 63%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/ApproachingMarkerState.cpp.o
[ 65%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/AvoidanceState.cpp.o
[ 68%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/IdleState.cpp.o
[ 71%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/NavigatingState.cpp.o
[ 73%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/ReversingState.cpp.o
[ 76%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/SearchPatternState.cpp.o
[ 78%] Building CXX object CMakeFiles/Autonomy_Software.dir/src/states/StuckState.cpp.o
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [CMakeFiles/Autonomy_Software.dir/build.make:272: CMakeFiles/Autonomy_Software.dir/src/handlers/TagDetectionHandler.cpp.o] Error 1
make[2]: *** Waiting for unfinished jobs....
c++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [CMakeFiles/Autonomy_Software.dir/build.make:314: CMakeFiles/Autonomy_Software.dir/src/states/ApproachingMarkerState.cpp.o] Error 1
2 Likes

I am having the same issue and can replicate it inside of a VSCode devcontainer. Including the torch.h or script.h files in any .h or .hpp file causes the gcc build process to freeze and the container crashes.

2 Likes

I thought I was the only one, so glad I finally found someone with the same issue. It’s like clockwork: as soon as I throw in either the torch.h or script.h files into any .h or .hpp file, the whole gcc build process just grinds to a halt. It’s like it’s hitting a brick wall every time.

2 Likes

Which versions of PyTorch are you using? I’ve tried every release from PyTorch 2.0.0 and PyTorch 2.2.2.

1 Like

I believe I was originally on version 2.0.1 and then I switched to the latest version 2.2.2.

1 Like

I tried PyTorch 2.2.2 first, then encountered the same issue with PyTorch 2.2.0.

2 Likes

This issue has been killing me. I am glad someone has finally reported it. The sooner it is fixed, the better.

2 Likes

I have the same issue. Has anyone found any sort of fix yet?

1 Like

I’m just wondering if moving those lines to your header file is making the compilation be much more memory-intensive and you’re running out of RAM.

While it is compiling, can you check your RAM usage (via top or htop or something)?

1 Like

Try reducing the amount of concurrent jobs in the build system.

As I can see you are using CMake + make

After you call CMake to generate the build files run make the -j flag and the amount of concurrent processes. By default make does 1 command at a time but something may be increasing that. You may have the cores but not the usable ram to tank the load.

If you limit ram with the -m flag it triggers the low memory killer of Linux when usage is over the limit.

Compiling often uses much more RAM on the linking phase, so a symtom of this is the process failing just before the build finishes.

1 Like

How much RAM usage is normal when compiling a project with torch? I’m using libtorch to add the library to my project. My project isn’t very big, but takes much longer to compile sequentially, and I don’t think that building sequentially is a reasonable solution.

My machine has 64 gigs of RAM, and compiles in under a minute without issue when not including torch. Could there be something configured incorrectly in my Cmakelists with the way I’m including libtorch that causes the build to use so much memory?

Cmakelists.txt

So I have done some more testing and have noticed a few things.

  1. If I use cmake and then run just the make command it crashes maxing out both the CPU and RAM on my system.

  2. If I use cmake and then run make -j1 all the way up to make -j12 it works just fine and doesn’t crash my system. And the larger number I associate with the -j command the faster it builds.

And when it fails it fails before reaching the linking step typically when compiling the file(s) with the libtorch headers.