Hi @soulslicer, did you ever manage to fix this issue? I think I have the exact same bug:
#include <torch/extension.h>
int square_sum(int size0, int size1, int size2)
{
return size0*size0 + size1*size1 + size2*size2;
}
torch::Tensor zero_tensor(int size0, int size1, int size2)
{
return torch::zeros({size0, size1, size2});
}
PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {
m.def("square_sum", &square_sum, "Square Sum");
m.def("zero_tensor", &zero_tensor, "Zero Tensor");
}
The square_sum
method works, the zero_tensor
fails with a segmentation fault, and gdb shows the same thing as you.
I tried it with the precompiled libtorch and with a libtorch I compiled from source. I tried with and without anaconda. I tried compiling using a setup.py
file and with a simple Makefile. I tried changing torch::Tensor
to at::Tensor
. Nothing seems to work.
Here is some output from valgrind:
==355897== Invalid read of size 8
==355897== at 0x11C6A347: THPVariable_NewWithVar(_typeobject*, at::Tensor, c10::impl::PyInterpreterStatus) (in /home/evilgras/opt/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
==355897== by 0x11C6AB0E: THPVariable_Wrap(at::Tensor) (in /home/evilgras/opt/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
==355897== by 0x486FD85: void pybind11::cpp_function::initialize<at::Tensor (*&)(int, int, int), at::Tensor, int, int, int, pybind11::name, pybind11::scope, pybind11::sibling, char [12]>(at::Tensor (*&)(int, int, int), at::Tensor (*)(int, int, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [12])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (pybind.h:47)
==355897== by 0x486D2D9: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:767)
==355897== by 0x25C347: cfunction_call_varargs (call.c:743)
==355897== by 0x25C347: PyCFunction_Call (call.c:773)
==355897== by 0x24BDBB: _PyObject_MakeTpCall (call.c:159)
==355897== by 0x2D7665: _PyObject_Vectorcall (abstract.h:125)
==355897== by 0x2D7665: call_function (ceval.c:4963)
==355897== by 0x2D7665: _PyEval_EvalFrameDefault (ceval.c:3469)
==355897== by 0x2A126F: PyEval_EvalFrameEx (ceval.c:741)
==355897== by 0x2A126F: _PyEval_EvalCodeWithName (ceval.c:4298)
==355897== by 0x336542: PyEval_EvalCodeEx (ceval.c:4327)
==355897== by 0x336542: PyEval_EvalCode (ceval.c:718)
==355897== by 0x3365E3: run_eval_code_obj (pythonrun.c:1165)
==355897== by 0x35C853: run_mod (pythonrun.c:1187)
==355897== by 0x21D38F: pyrun_file (pythonrun.c:1084)
==355897== Address 0x130 is not stack'd, malloc'd or (recently) free'd
==355897==
==355897==
==355897== Process terminating with default action of signal 11 (SIGSEGV)
==355897== Access not within mapped region at address 0x130
==355897== at 0x11C6A347: THPVariable_NewWithVar(_typeobject*, at::Tensor, c10::impl::PyInterpreterStatus) (in /home/evilgras/opt/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
==355897== by 0x11C6AB0E: THPVariable_Wrap(at::Tensor) (in /home/evilgras/opt/anaconda3/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
==355897== by 0x486FD85: void pybind11::cpp_function::initialize<at::Tensor (*&)(int, int, int), at::Tensor, int, int, int, pybind11::name, pybind11::scope, pybind11::sibling, char [12]>(at::Tensor (*&)(int, int, int), at::Tensor (*)(int, int, int), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&, char const (&) [12])::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (pybind.h:47)
==355897== by 0x486D2D9: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (pybind11.h:767)
==355897== by 0x25C347: cfunction_call_varargs (call.c:743)
==355897== by 0x25C347: PyCFunction_Call (call.c:773)
==355897== by 0x24BDBB: _PyObject_MakeTpCall (call.c:159)
==355897== by 0x2D7665: _PyObject_Vectorcall (abstract.h:125)
==355897== by 0x2D7665: call_function (ceval.c:4963)
==355897== by 0x2D7665: _PyEval_EvalFrameDefault (ceval.c:3469)
==355897== by 0x2A126F: PyEval_EvalFrameEx (ceval.c:741)
==355897== by 0x2A126F: _PyEval_EvalCodeWithName (ceval.c:4298)
==355897== by 0x336542: PyEval_EvalCodeEx (ceval.c:4327)
==355897== by 0x336542: PyEval_EvalCode (ceval.c:718)
==355897== by 0x3365E3: run_eval_code_obj (pythonrun.c:1165)
==355897== by 0x35C853: run_mod (pythonrun.c:1187)
==355897== by 0x21D38F: pyrun_file (pythonrun.c:1084)
Not sure if this is a hint, but 0x130 seems like a very low number for a memory address (from the message Access not within mapped region at address 0x130
).
Can anyone help? Iām using Ubuntu 20.04 with gcc 9.3.0 if that helps.