Releasing GIL in C++ extension

Recently, I updated my PyTorch from 0.3 to 0.4 so I also rewrote my C extension using the new C++ APIs in 0.4. Previously, if I wrapped my C extension with DataParallel, the CPU usage of my script could go above 100% (170%). However, if I wrapped the new C++ extension with DataParallel, the CPU usage could not go above 100% and the training was slowed down. After doing some research, I suspected that is related to GIL and I also found a way to release GIL in my C++ source files.

    m.def("forward", &foo_forward, "Foo Forward", py::call_guard<py::gil_scoped_release>());
    m.def("backward", &foo_backward, "Foo Backward", py::call_guard<py::gil_scoped_release>());

After I added py::call_guard<py::gil_scoped_release>(), the CPU usage could go above 100% and the training speed returned to normal. I just wonder if this will have some potential problems. Thanks in advance.

1 Like

This should be fine. There shouldn’t be any pytorch-specific problems, but of course all the typical things to avoid when releasing GIL in pybind will apply.

1 Like

Can you please some code how you added the line? were you using cuda or you run your network on CPU?

I was curious why my old C extensions did not suffer from GIL. It looks like ffi, which was previously used to compile PyTorch extension, would release GIL by default. Is that correct?

I was just following the tutorial here and added py::call_guard<py::gil_scoped_release>() as the last argument to m.def in PYBIND11_MODULE.

That extension runs on GPU. It has some looping so it is CPU intensive as well.

1 Like

Do you know of a documentation for m.def arguments?
What are the implications? I wonder if it would break things assuming we stick to ATen and kernels (no Python object references).

PyTorch uses a library called pybind11 to create Python bindings. You can find the documentation in this I am not sure if releasing GIL would break anything.

1 Like

Yeah it looks like ffi automatically releases the GIL for C function calls.