How to get pytorch C++ crash callstack?

yoelshoshan · August 18, 2018, 6:47am

What is the simplest way to get pytorch crashes C++ callstack ?

[edit: found this which seems to help https://stackoverflow.com/questions/28108851/catching-segfault-with-debugger-in-python ]

Is the only way to build pytorch yourself, and even then, is there any flag that needs to be set to make sure that “debug symbols” are available?

I’m aiming towards getting crash messages like the text at the bottom of my question.

I can see that gdb is being used here, any tips/pointers on the steps needed to make gdb kick in when the crash happens?

Epoch: [268][670/782]   Time 0.354 (0.331)      Data 0.001 (0.001)      Loss 0.0003 (0.0014)    Prec@1 100.000 (99.984) Prec@5 100.000 (100.000)

Thread 6 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fff22ba1700 (LWP 23233)]
std::__push_heap<__gnu_cxx::__normal_iterator<torch::autograd::FunctionTask*, std::vector<torch::autograd::FunctionTask, std::allocator<torch::autograd::FunctionTask> > >, long, torch::autograd::FunctionTask, torch::autograd::CompareFunctionTaskTime> (__first=..., __holeIndex=5, __topIndex=__topIndex@entry=0, __value=..., __comp=__comp@entry=...)
    at /home/why/anaconda3/gcc/include/c++/bits/stl_heap.h:182
182           while (__holeIndex > __topIndex
(gdb) bt
#0  std::__push_heap<__gnu_cxx::__normal_iterator<torch::autograd::FunctionTask*, std::vector<torch::autograd::FunctionTask, std::allocator<torch::autograd::FunctionTask> > >, long, torch::autograd::FunctionTask, torch::autograd::CompareFunctionTaskTime> (__first=..., 
    __holeIndex=5, __topIndex=__topIndex@entry=0, __value=..., __comp=__comp@entry=...) at /home/why/anaconda3/gcc/include/c++/bits/stl_heap.h:182
#1  0x00007fffecd33480 in std::push_heap<__gnu_cxx::__normal_iterator<torch::autograd::FunctionTask*, std::vector<torch::autograd::FunctionTask> >, torch::autograd::CompareFunctionTaskTime> (__comp=..., __last=..., __first=...)
    at /home/why/anaconda3/gcc/include/c++/bits/stl_heap.h:221
#2  std::priority_queue<torch::autograd::FunctionTask, std::vector<torch::autograd::FunctionTask, std::allocator<torch::autograd::FunctionTask> >, torch::autograd::CompareFunctionTaskTime>::push(torch::autograd::FunctionTask&&) (
    __x=<unknown type in /home/why/anaconda3/lib/python3.6/site-packages/torch/_C.cpython-36m-x86_64-linux-gnu.so, CU 0x2314a43, DIE 0x23cd8e1>, this=0x7fff2be84fe0) at /home/why/anaconda3/gcc/include/c++/bits/stl_queue.h:507
#3  torch::autograd::ReadyQueue::push (this=0x7fff2be84fe0, item=...) at torch/csrc/autograd/engine.cpp:128
#4  0x00007fffecd364ed in torch::autograd::Engine::thread_main (this=0x7fffee562680 <engine>, graph_task=0x0) at torch/csrc/autograd/engine.cpp:199
#5  0x00007fffecd329d4 in torch::autograd::Engine::thread_init (this=this@entry=0x7fffee562680 <engine>, device=device@entry=-1) at torch/csrc/autograd/engine.cpp:150
#6  0x00007fffecd5bf0a in torch::autograd::python::PythonEngine::thread_init (this=0x7fffee562680 <engine>, device=-1) at torch/csrc/autograd/python_engine.cpp:34
#7  0x00007ffff7b0dc80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#8  0x00007ffff73376ba in start_thread (arg=0x7fff22ba1700) at pthread_create.c:333
#9  0x00007ffff67553dd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

ptrblck · August 19, 2018, 9:59pm

@colesbury gave some intructions in this thread. Maybe you can use it to debug your issue.

yoelshoshan · August 20, 2018, 9:55am

great, thanks! will give it a try

smth · October 16, 2022, 11:22pm

if you have the environment variable TORCH_SHOW_CPP_STACKTRACES=1, then full C++ stack-traces are shown.

As described in pytorch/CONTRIBUTING.md at master · pytorch/pytorch · GitHub