Pytorch CUDA problems with Pybind11

georgeh · February 11, 2023, 12:49am

Good Afternoon,

I am wondering if anyone has experienced the following issue. I am trying to embed a python interpreter in c++ using pybind11, and then inside the interpreter do some operations on tensors on the GPU. The problem is, the second I send a tensor to the GPU through a command such as .cuda or .to(device) (with the device being cuda), it causes my main c++ code to either hang or to aborted core dumped.

My c++ code is as follows:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

namespace py = pybind11;

int main() {
    py::scoped_interpreter guard{};
    py::module m = py::module::import("cuda_test");
    py::function test_func = m.attr("test_func");
    auto test = test_func().cast<py::int_>();
    return 0;
}

and my python code is as follows

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def test_func():
    a = torch.tensor([0]).to("device")
    print(a.is_cuda)
    return 0

The code compiles fine, and it works so long as I set the device to “cpu”. If the device is “cuda” then the program either hangs indefinitely, or produces the message “Aborted (core dumped)”. For reference, I am running this on Ubuntu 20.04. Oddly enough, I can see the process in nvidia-smi, so something must be happening with the GPU. Any ideas on what is happening here? Has anyone experienced this?

Thank you again for any ideas or suggestions. I am a complete novice, so I may be doing something incorrect which is very simple.

georgeh · February 13, 2023, 11:08pm

Ok, just as a bit of extra information, I have determined that I was including arrayfire headers and libraries in my makefile (though I didn’t include a directive in my code above), and this seems to be the linchpin of this whole thing - if I remove the arrayfire headers and libraries then everything works fine! Unfortunately, I need my code to work alongside arrayfire, so now my question is: Is it known that arrayfire and pytorch conflict with each other somehow in this context?

ptrblck · February 14, 2023, 5:33am

I don’t know what might be causing the issue, but you might need to debug it by checking what the actual error is via gdb and the corresponding backtrace.

georgeh · February 14, 2023, 7:22pm

Hi ptrblck,

Thank you for your response! I am pasting the output of running gdb and bt below. Unfortunately I am a complete novice and have no clue what the following message might mean, of course this may be useful for anyone else who is attempting the same thing.

[Thread debugging using libthread_db enabled]
Using host libthread_db library “/lib/x86_64-linux-gnu/libthread_db.so.1”.
[New Thread 0x7fffe9140700 (LWP 832421)]
[New Thread 0x7fffe893f700 (LWP 832422)]
[New Thread 0x7fffe1dff700 (LWP 832423)]
[New Thread 0x7fffe15fe700 (LWP 832424)]
[New Thread 0x7fffe0dfd700 (LWP 832425)]
[New Thread 0x7fffc5fff700 (LWP 832426)]
[New Thread 0x7fffc57fe700 (LWP 832427)]
[New Thread 0x7fff3f801700 (LWP 832431)]

Thread 1 “main” received signal SIGSEGV, Segmentation fault.
0x00005555566d2fb0 in ?? ()
(gdb) bt
#0 0x00005555566d2fb0 in ?? ()
#1 0x00007fff3e7a8dc0 in ?? ()
#2 0x00007fff3e99a81c in ?? ()
#3 0x00007fff3e99a810 in ?? ()
#4 0x00007fff3e9daf58 in ?? ()
#5 0x00007ffff7d5a4eb in _Py_Dealloc (
op=) at /usr/local/src/conda/python-3.10.9/Objects/object.c:2295
#6 _Py_DECREF (op=0x555556cb1040)
at /usr/local/src/conda/python-3.10.9/Include/object.h:500
#7 _Py_XDECREF (op=0x555556cb1040)
at /usr/local/src/conda/python-3.10.9/Include/object.h:567
#8 _PyEval_EvalFrameDefault (tstate=, f=,
throwflag=)
at /usr/local/src/conda/python-3.10.9/Python/ceval.c:4280
#9 0x00007ffff7d663a7 in PyDict_GetItemWithError (key=0x7fff31550280, op=0x0)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1306
#10 _PyObject_GenericGetAttrWithDict (obj=,
name=0x7fff31550280, dict=0x0, suppress=0)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1285
#11 0x0000555555573420 in ?? ()
#12 0x00007fff3e7b02b0 in ?? ()
#13 0x00007fff31542610 in ?? ()
–Type for more, q to quit, c to continue without paging–
#14 0x00007fff3e7b02b0 in ?? ()
#15 0x00005555566d2fb0 in ?? ()
#16 0x00007fff3ea33070 in ?? ()
#17 0x00007ffff7d754c4 in PyObject_GenericGetAttr (
name=, obj=0x7fff316e7880)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:1333
#18 PyObject_GetAttr (name=0x7fff3e7b02b0, v=0x7fff316e7880)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:932
#19 _PyObject_GetAttrId (name=0x7ffff7fa2f30 <PyId__initializing.3>,
v=)
at /usr/local/src/conda/python-3.10.9/Objects/object.c:872
#20 PyModuleSpec_IsInitializing (
spec=)
at /usr/local/src/conda/python-3.10.9/Objects/moduleobject.c:709
#21 import_ensure_initialized (name=0x555555573ae0 <Py_FalseStruct>,
mod=0x5555566d2fb0, interp=)
at /usr/local/src/conda/python-3.10.9/Python/import.c:358
#22 PyImport_ImportModuleLevelObject (name=0x7fff3ea0af30,
globals=, locals=, fromlist=,
level=)
at /usr/local/src/conda/python-3.10.9/Python/import.c:1617
–Type for more, q to quit, c to continue without paging–
#23 0x00007ffff7d75987 in PyImport_ImportModuleLevelObject (
name=0x7fff3e79c800, globals=, locals=,
fromlist=, level=)
at /usr/local/src/conda/python-3.10.9/Include/object.h:500
#24 0x00007ffff7d818f8 in builtin___import (self=,
args=, kwds=)
at /usr/local/src/conda/python-3.10.9/Python/bltinmodule.c:272
#25 0x00007fff3e79d620 in ?? ()
#26 0x00005555566d2fb0 in ?? ()
#27 0x00007fff3ea2c9a0 in ?? ()
#28 0x0000000000000000 in ?? ()

ptrblck · February 14, 2023, 9:00pm

Unfortunately, I don’t see the root cause of the issue in the stacktrace besides a Dealloc call on an object which seems to be freed already.

ten_ko · June 20, 2023, 2:50am

Hello, I am also encountering this issue when using Cuda in Pybind embedded interpreter. Have you figured it out?