Pytorch CUDA problems with Pybind11

Good Afternoon,

I am wondering if anyone has experienced the following issue. I am trying to embed a python interpreter in c++ using pybind11, and then inside the interpreter do some operations on tensors on the GPU. The problem is, the second I send a tensor to the GPU through a command such as .cuda or .to(device) (with the device being cuda), it causes my main c++ code to either hang or to aborted core dumped.

My c++ code is as follows:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{};
    py::module m = py::module::import("cuda_test");
    py::function test_func = m.attr("test_func");
    auto test = test_func().cast<py::int_>();
    return 0;
}

and my python code is as follows

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def test_func():
    a = torch.tensor([0]).to("device")
    print(a.is_cuda)
    return 0

The code compiles fine, and it works so long as I set the device to “cpu”. If the device is “cuda” then the program either hangs indefinitely, or produces the message “Aborted (core dumped)”. For reference, I am running this on Ubuntu 20.04. Oddly enough, I can see the process in nvidia-smi, so something must be happening with the GPU. Any ideas on what is happening here? Has anyone experienced this?

Thank you again for any ideas or suggestions. I am a complete novice, so I may be doing something incorrect which is very simple.

Ok, just as a bit of extra information, I have determined that I was including arrayfire headers and libraries in my makefile (though I didn’t include a directive in my code above), and this seems to be the linchpin of this whole thing - if I remove the arrayfire headers and libraries then everything works fine! Unfortunately, I need my code to work alongside arrayfire, so now my question is: Is it known that arrayfire and pytorch conflict with each other somehow in this context?

I don’t know what might be causing the issue, but you might need to debug it by checking what the actual error is via gdb and the corresponding backtrace.

Hi ptrblck,

Thank you for your response! I am pasting the output of running gdb and bt below. Unfortunately I am a complete novice and have no clue what the following message might mean, of course this may be useful for anyone else who is attempting the same thing.

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7fffe9140700 (LWP 832421)]
[New Thread 0x7fffe893f700 (LWP 832422)]
[New Thread 0x7fffe1dff700 (LWP 832423)]
[New Thread 0x7fffe15fe700 (LWP 832424)]
[New Thread 0x7fffe0dfd700 (LWP 832425)]
[New Thread 0x7fffc5fff700 (LWP 832426)]
[New Thread 0x7fffc57fe700 (LWP 832427)]
[New Thread 0x7fff3f801700 (LWP 832431)]

Thread 1 “main” received signal SIGSEGV, Segmentation fault.
0x00005555566d2fb0 in ?? ()
(gdb) bt
#0 0x00005555566d2fb0 in ?? ()
#1 0x00007fff3e7a8dc0 in ?? ()
#2 0x00007fff3e99a81c in ?? ()
#3 0x00007fff3e99a810 in ?? ()
#4 0x00007fff3e9daf58 in ?? ()
#5 0x00007ffff7d5a4eb in _Py_Dealloc (
op=) at /usr/local/src/conda/python-3.10.9/Objects/object.c:2295
#6 _Py_DECREF (op=0x555556cb1040)
at /usr/local/src/conda/python-3.10.9/Include/object.h:500
#7 _Py_XDECREF (op=0x555556cb1040)
at /usr/local/src/conda/python-3.10.9/Include/object.h:567
#8 _PyEval_EvalFrameDefault (tstate=, f=,
throwflag=)
at /usr/local/src/conda/python-3.10.9/Python/ceval.c:4280
#9 0x00007ffff7d663a7 in PyDict_GetItemWithError (key=0x7fff31550280, op=0x0)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1306
#10 _PyObject_GenericGetAttrWithDict (obj=,
name=0x7fff31550280, dict=0x0, suppress=0)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1285
#11 0x0000555555573420 in ?? ()
#12 0x00007fff3e7b02b0 in ?? ()
#13 0x00007fff31542610 in ?? ()
–Type for more, q to quit, c to continue without paging–
#14 0x00007fff3e7b02b0 in ?? ()
#15 0x00005555566d2fb0 in ?? ()
#16 0x00007fff3ea33070 in ?? ()
#17 0x00007ffff7d754c4 in PyObject_GenericGetAttr (
name=, obj=0x7fff316e7880)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1333
#18 PyObject_GetAttr (name=0x7fff3e7b02b0, v=0x7fff316e7880)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:932
#19 _PyObject_GetAttrId (name=0x7ffff7fa2f30 <PyId__initializing.3>,
v=)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:872
#20 PyModuleSpec_IsInitializing (
spec=)
at /usr/local/src/conda/python-3.10.9/Objects/moduleobject.c:709
#21 import_ensure_initialized (name=0x555555573ae0 <Py_FalseStruct>,
mod=0x5555566d2fb0, interp=)
at /usr/local/src/conda/python-3.10.9/Python/import.c:358
#22 PyImport_ImportModuleLevelObject (name=0x7fff3ea0af30,
globals=, locals=, fromlist=,
level=)
at /usr/local/src/conda/python-3.10.9/Python/import.c:1617
–Type for more, q to quit, c to continue without paging–
#23 0x00007ffff7d75987 in PyImport_ImportModuleLevelObject (
name=0x7fff3e79c800, globals=, locals=,
fromlist=, level=)
at /usr/local/src/conda/python-3.10.9/Include/object.h:500
#24 0x00007ffff7d818f8 in builtin___import
(self=,
args=, kwds=)
at /usr/local/src/conda/python-3.10.9/Python/bltinmodule.c:272
#25 0x00007fff3e79d620 in ?? ()
#26 0x00005555566d2fb0 in ?? ()
#27 0x00007fff3ea2c9a0 in ?? ()
#28 0x0000000000000000 in ?? ()

Unfortunately, I don’t see the root cause of the issue in the stacktrace besides a Dealloc call on an object which seems to be freed already.

Hello, I am also encountering this issue when using Cuda in Pybind embedded interpreter. Have you figured it out?