Tensor.backward called within C++ extension hangs

Calling .backward() from within a C++ extension hangs.
Tried on Ubuntu and Arch, one with CUDA and one CPU optimized, pytorch version 1.3 (stable) and python 3.7.

Here is the basic c++ example (say diff.cpp)

#include <torch/extension.h>

void backw(torch::Tensor tens) {
    tens.backward({}, true, true);
}

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
    m.def("backw", &backw, "DIFF backw");
}

With setup.py:

from setuptools import setup, Extension
from torch.utils import cpp_extension

setup(name='diff_cpp',
      ext_modules=[cpp_extension.CppExtension('diff_cpp', ['diff.cpp'])],
      cmdclass={'build_ext': cpp_extension.BuildExtension})

and installing it as python setup.py install, the following will hang:

import torch
import diff_cpp  # fine
x = torch.tensor([1.0, 2.0])
y = torch.sum(x)
diff_cpp.backw(y)  # hangs

I couldn’t yet find out where it hangs, interrupting the process doesn’t work and I have to kill the process.

Can you run your program in gdb to check where it hangs please?
gdb python then r your_script.py.

Here’s the output of gdb (with above script, except I forgot to type requires_grad=True there):

(gdb) r test.py
Starting program: /home/mtgd/cpp-test/env/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
[New Thread 0x7ffff4bc8700 (LWP 129784)]
[New Thread 0x7ffff43c7700 (LWP 129785)]
[New Thread 0x7fffefbc6700 (LWP 129786)]
[Thread 0x7fffefbc6700 (LWP 129786) exited]
[Thread 0x7ffff43c7700 (LWP 129785) exited]
[Thread 0x7ffff4bc8700 (LWP 129784) exited]
[Detaching after fork from child process 129787]
/usr/lib/../share/gcc-9.2.0/python/libstdcxx/v6/xmethods.py:731: SyntaxWarning: list indices must be integers or slices, not str; perhaps you missed a comma?
  refcounts = ['_M_refcount']['_M_pi']
[New Thread 0x7fffefbc6700 (LWP 129790)]
[New Thread 0x7ffff43c7700 (LWP 129791)]
^C    <----- hangs here
Thread 1 "python" received signal SIGINT, Interrupt.
0x00007ffff7a81c45 in pthread_cond_wait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0

Perfect. And what is the stack trace when you type bt after the interupting ?

(gdb) bt
#0  0x00007ffff7a81c45 in pthread_cond_wait@@GLIBC_2.3.2 ()
   from /usr/lib/libpthread.so.0
#1  0x00007fffe9c061c1 in __gthread_cond_wait (__mutex=<optimized out>, 
    __cond=<optimized out>)
    at /build/gcc/src/gcc-build/x86_64-pc-linux-gnu/libstdc++-v3/include/x86_64-pc-linux-gnu/bits/gthr-default.h:865
#2  std::condition_variable::wait (this=<optimized out>, __lock=...)
    at /build/gcc/src/gcc/libstdc++-v3/src/c++11/condition_variable.cc:53
#3  0x00007fffdfb899dc in torch::autograd::Engine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) ()
   from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch.so
#4  0x00007fffea18015e in torch::autograd::python::PythonEngine::execute(std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&, std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, bool, bool, std::vector<torch::autograd::Edge, std::allocator<torch::autograd::Edge> > const&) ()
   from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch_python.so
rad::Variable, std::allocator<torch::autograd::Variable> > const&, std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, bool, bool, std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, bool) () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch.so
#6  0x00007fffdfb780e8 in torch::autograd::backward(std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, std::vector<torch::autograd::Variable, std::allocator<torch::autograd::Variable> > const&, c10::optional<bool>, bool) () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch.so
#7  0x00007fffdfbb1ac5 in torch::autograd::Variable::backward(at::Tensor const&, bool, bool) const () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch.so
#8  0x00007fffdc2308de in at::ATenOpTable::callUnboxed<void, at::Tensor const&, at::Tensor const&, bool, bool> (this=0x555555b80150)
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/c10/util/llvmMathExtras.h:204
#9  at::Tensor::backward (create_graph=true, keep_graph=true, gradient=..., this=0x7fffffffdbd0) at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/TensorMethods.h:65
#10 backw (tens=...) at diff.cpp:4
#11 0x00007fffdc23227b in pybind11::detail::argument_loader<at::Tensor>::call_impl<void, void (*&)(at::Tensor), 0ul, pybind11::detail::void_type> (f=<optimized out>, this=0x7fffffffdbb8)
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/c10/util/intrusive_ptr.h:216
#12 pybind11::detail::argument_loader<at::Tensor>::call<void, pybind11::detail::void_type, void (*&)(at::Tensor)>(void (*&)(at::Tensor)) && (f=<optimized out>, this=0x7fffffffdbb8)
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/pybind11/cast.h:1913
#13 pybind11::cpp_function::initialize<void (*&)(at::Tensor), void, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [11]>(void (*&)(at::Tensor), void (*)(at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [11])::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const (call=..., this=0x0)
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/pybind11/pybind11.h:155
#14 pybind11::cpp_function::initialize<void (*&)(at::Tensor), void, at::Tensor, pybind11::name, pybind11::scope, pybind11::sibling, char [11]>(void (*&)(at::Tensor), void (*)(at::Tensor), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [11])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) ()
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/pybind11/pybind11.h:133
#15 0x00007fffdc23a87e in pybind11::cpp_function::dispatcher (self=<optimized out>, args_in=0x7ffff74a8d50, kwargs_in=0x0)
    at /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/pybind11/pybind11.h:620
#16 0x00007ffff7bce688 in _PyMethodDef_RawFastCallKeywords () from /usr/lib/libpython3.7m.so.1.0
#17 0x00007ffff7bce784 in _PyCFunction_FastCallKeywords () from /usr/lib/libpython3.7m.so.1.0
#18 0x00007ffff7beccc4 in ?? () from /usr/lib/libpython3.7m.so.1.0
#19 0x00007ffff7c1c9b4 in _PyEval_EvalFrameDefault () from /usr/lib/libpython3.7m.so.1.0
#20 0x00007ffff7be0fd8 in _PyEval_EvalCodeWithName () from /usr/lib/libpython3.7m.so.1.0
#21 0x00007ffff7be1dba in PyEval_EvalCodeEx () from /usr/lib/libpython3.7m.so.1.0
#22 0x00007ffff7c5eebc in PyEval_EvalCode () from /usr/lib/libpython3.7m.so.1.0
#23 0x00007ffff7c90345 in ?? () from /usr/lib/libpython3.7m.so.1.0
#24 0x00007ffff7b78e67 in PyRun_FileExFlags () from /usr/lib/libpython3.7m.so.1.0
#25 0x00007ffff7b87b37 in PyRun_SimpleFileExFlags () from /usr/lib/libpython3.7m.so.1.0
#26 0x00007ffff7c9a640 in ?? () from /usr/lib/libpython3.7m.so.1.0
#27 0x00007ffff7c9a6bc in _Py_UnixMain () from /usr/lib/libpython3.7m.so.1.0
#28 0x00007ffff7dfd153 in __libc_start_main () from /usr/lib/libc.so.6
#29 0x000055555555505e in _start ()

So this one is where it should be.
You can use thread x to swtich to the xth thread.
Can you get the backtrace from the other thread, by switching to each thread and calling bt.

Thanks for your help!

I am not familiar with gdb, so please excuse my ignorance, but it seems there is only one thread. The only valid command is thread 1 (no 2, 3, …). The hexadecimals don’t seem to be threads either.

Nevermind, some experimenting got me threads 1 (the one above), 5 and 6:

Thread 5

(gdb) bt
#0  0x00007ffff7a81f7a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ffff7b9c3ce in PyEval_RestoreThread () from /usr/lib/libpython3.7m.so.1.0
#2  0x00007ffff7b14c3e in ?? () from /usr/lib/libpython3.7m.so.1.0
#3  0x00007fffea180015 in torch::autograd::python::PythonEngine::thread_init(int) () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#4  0x00007fffeaaf75ef in execute_native_thread_routine () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
#5  0x00007ffff7a7b4cf in start_thread () from /usr/lib/libpthread.so.0
#6  0x00007ffff7ed52d3 in clone () from /usr/lib/libc.so.6

Thread 6

(gdb) bt
#0  0x00007ffff7a81f7a in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib/libpthread.so.0
#1  0x00007ffff7b9c3ce in PyEval_RestoreThread () from /usr/lib/libpython3.7m.so.1.0
#2  0x00007ffff7b14c3e in ?? () from /usr/lib/libpython3.7m.so.1.0
#3  0x00007fffea180015 in torch::autograd::python::PythonEngine::thread_init(int) () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/lib/libtorch_python.so
#4  0x00007fffeaaf75ef in execute_native_thread_routine () from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
#5  0x00007ffff7a7b4cf in start_thread () from /usr/lib/libpthread.so.0
#6  0x00007ffff7ed52d3 in clone () from /usr/lib/libc.so.6

According to thread find . those are all.

A similar example using exclusively the C++ interface of the same pytorch version works without problems (.backward() does not hang).

It would already be interesting to know whether the above issue occurs for anyone else or if it runs for someone.

Looks like a GIL issue. @ezyang could this be caused by the change to the pybind11 handler?

@mtgd could you try if you still see this when you use the nightly build?

Trying to use the nightly version completely fails for me. I’m not sure if I should open another topic for this, maybe I made a mistake somewhere. For reference here is exactly what I tried:

  • Fresh python3.7 virtual environment: python3.7 -m venv env
  • Activate venv
  • Follow the nightly install instructions
    pip install numpy
    pip install --pre torch torchvision -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
    
  • Try to install the extension exactly as outlined above

I get a very lengthy error message running python setup.py install: command ‘gcc’ failed with exit status 1.

The whole output is 23169 lines long, so I can hardly post it here. It starts with

which: no nvcc in ($PATH)

which is confusing to me since I installed the cpu optimized version (no CUDA on my platform). After this it looks normal

running install
running bdist_egg
running egg_info
writing diff_cpp.egg-info/PKG-INFO
writing dependency_links to diff_cpp.egg-info/dependency_links.txt
writing top-level names to diff_cpp.egg-info/top_level.txt
reading manifest file 'diff_cpp.egg-info/SOURCES.txt'
writing manifest file 'diff_cpp.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
building 'diff_cpp' extension
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -march=x86-64 -mtune=generic -O3 -pipe -fno-plt -fPIC -I/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include -I/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/TH -I/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/THC -I/home/mtgd/cpp-test/env/include -I/usr/include/python3.7m -c diff.cpp -o build/temp.linux-x86_64-3.7/diff.o -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=diff_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11

Then comes the error output

In file included from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/TensorMethods.h:10,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/Tensor.h:12,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/Context.h:4,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/ATen.h:5,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/all.h:4,
                 from /home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/torch/extension.h:4,
                 from diff.cpp:1:
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h: In instantiation of ‘Return c10::Dispatcher::doCallUnboxedOnly(const c10::DispatchTable&, const c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction> >&, Args ...) const [with Return = void; Args = {const at::Tensor&, const at::Tensor&, bool, bool}]’:
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:201:114:   required from ‘Return c10::Dispatcher::callUnboxedOnly(const c10::OperatorHandle&, Args ...) const [with Return = void; Args = {const at::Tensor&, const at::Tensor&, bool, bool}]’
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/TensorMethods.h:66:75:   required from here
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:211:80: error: redeclaration of ‘const at::Tensor& args#0’
  211 |     return kernel.template callUnboxedOnly<Return, Args...>(std::forward<Args>(args)...);
      |                                                                                ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:209:115: note: ‘const at::Tensor& args#0’ previously declared here
  209 |     c10::optional<TensorTypeId> dispatchKey = dispatchTable.dispatchKeyExtractor().getDispatchKeyUnboxed<Args...>(args...);
      |                                                                                                                   ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:211:80: error: redeclaration of ‘const at::Tensor& args#1’
  211 |     return kernel.template callUnboxedOnly<Return, Args...>(std::forward<Args>(args)...);
      |                                                                                ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:209:115: note: ‘const at::Tensor& args#1’ previously declared here
  209 |     c10::optional<TensorTypeId> dispatchKey = dispatchTable.dispatchKeyExtractor().getDispatchKeyUnboxed<Args...>(args...);
      |                                                                                                                   ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:211:80: error: redeclaration of ‘bool& args#2’
  211 |     return kernel.template callUnboxedOnly<Return, Args...>(std::forward<Args>(args)...);
      |                                                                                ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:209:115: note: ‘bool& args#2’ previously declared here
  209 |     c10::optional<TensorTypeId> dispatchKey = dispatchTable.dispatchKeyExtractor().getDispatchKeyUnboxed<Args...>(args...);
      |                                                                                                                   ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:211:80: error: redeclaration of ‘bool& args#3’
  211 |     return kernel.template callUnboxedOnly<Return, Args...>(std::forward<Args>(args)...);
      |                                                                                ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:209:115: note: ‘bool& args#3’ previously declared here
  209 |     c10::optional<TensorTypeId> dispatchKey = dispatchTable.dispatchKeyExtractor().getDispatchKeyUnboxed<Args...>(args...);
      |                                                                                                                   ^~~~
/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/ATen/core/dispatch/Dispatcher.h:208:38: error: member ‘c10::Dispatcher::doCallUnboxedOnly(const c10::DispatchTable&, const c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction> >&, Args ...) const [with Return = void; Args = {const at::Tensor&, const at::Tensor&, bool, bool}]::<lambda(const ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction>&)>::<args#0 capture>’ is uninitialized reference
  208 |   return backendFallbackKernels.read([&] (const ska::flat_hash_map<TensorTypeId, KernelFunction>& backendFallbackKernels) -> Return {
      |                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  209 |     c10::optional<TensorTypeId> dispatchKey = dispatchTable.dispatchKeyExtractor().getDispatchKeyUnboxed<Args...>(args...);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  210 |     const KernelFunction& kernel = dispatch_(dispatchTable, backendFallbackKernels, dispatchKey);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  211 |     return kernel.template callUnboxedOnly<Return, Args...>(std::forward<Args>(args)...);
      |     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  212 |   });
      |   ~       

It goes on like this for quite a stretch, the last line is

/home/mtgd/cpp-test/env/lib/python3.7/site-packages/torch/include/c10/util/LeftRight.h:67:10: error: ‘typename std::result_of<F(const T&)>::type c10::LeftRight<T>::read(F&&) const [with F = c10::Dispatcher::doCallUnboxedOnly(const c10::DispatchTable&, const c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction> >&, Args ...) const [with Return = at::Tensor; Args = {const at::Tensor&, c10::ArrayRef<long int>, c10::ArrayRef<long int>, c10::ArrayRef<long int>, c10::ArrayRef<long int>}]::<lambda(const ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction>&)>; T = ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction>; typename std::result_of<F(const T&)>::type = at::Tensor]’, declared using local type ‘c10::Dispatcher::doCallUnboxedOnly(const c10::DispatchTable&, const c10::LeftRight<ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction> >&, Args ...) const [with Return = at::Tensor; Args = {const at::Tensor&, c10::ArrayRef<long int>, c10::ArrayRef<long int>, c10::ArrayRef<long int>, c10::ArrayRef<long int>}]::<lambda(const ska::flat_hash_map<c10::TensorTypeId, c10::KernelFunction>&)>’, is used but never defined [-fpermissive]

I’m not sure where to move from here. It doesn’t seem to matter what the content of the extension is. I get the same error even if the source script is completely empty (except for the include).
In case the full error message would be useful, please let me know.
Thanks for your time!

Ho it’s the first time I see this one. What is your gcc version? We increased the minimum version to 5+ recently to be able to use advanced cpp features for the dispatcher (which seems to be where the error comes from).

My gcc version is 9.2.0 (GCC), that shouldn’t be the issue.

cc @Sebastian_Messmer may have a better idea why this might happen?

I’ve just tried the same thing as above in multiple configurations (the nightly version did install with CUDA on Ubuntu):

  • Arch linux, torch 1.2, python 3.8, no CUDA
  • Ubuntu, torch 1.3, python 3.6, CUDA 10.1
  • Ubuntu, nightly build, python 3.6, CUDA 10.1

In all cases backward hangs.

It would already be useful to know whether anyone has got this to run.

I have also created an issue on github asking about this: https://github.com/pytorch/pytorch/issues/32045