Using Pytorch Autograd in Flask App: GIL issues

arthurdjn · November 7, 2020, 8:58pm

Hello,
I am currently working on a small web application used to visualize CNN. My Flask application works perfectly in development mode, but when I deployed it I had an error using the backward method:

RuntimeError: The autograd engine was called while holding the GIL. 
If you are using the C++ API, the autograd engine is an expensive operation that does not require the GIL to be held so you should release it with 'pybind11::gil_scoped_release no_gil;'. 
If you are not using the C++ API, please report a bug to the pytorch team.

After some tests, the error is raised when the backward method is called. For example the simple code below will raise the error:

@app.route("/api/test")
def test():
    # Some basic operations that do not take too much RAM
    tensor1 = torch.tensor([1.], requires_grad=True)
    tensor2 = torch.tensor([2.], requires_grad=True)
    prediction = tensor1 * 10 + tensor2 * 2
    prediction.backward()

    return {
        "status": "done"
    }

Why is this happening ? When I am using the server in development mode, everything is working as expected.

I am using the VPS on a Ubuntu 20.04, with two cores and 2GB RAM (CPU).

Thanks!

Full log:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/dist-packages/flask/_compat.py", line 39, in reraise
    raise value
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/dist-packages/flask/app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/var/www/ailia/backend/api/analysis.py", line 46, in test
    prediction.backward()
  File "/usr/local/lib/python3.8/dist-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/usr/local/lib/python3.8/dist-packages/torch/autograd/__init__.py", line 130, in backward
    Variable._execution_engine.run_backward(
RuntimeError: The autograd engine was called while holding the GIL. If you are using the C++ API, the autograd engine is an expensive operation that does not require the GIL to be held so you should release it with 'pybind11::gil_scoped_release no_gil;'. If you are not using the C++ API, please report a bug to the pytorch team.

albanD · November 9, 2020, 2:46pm

Hi,

This is very surprising indeed. Flask must be doing some very funky stuff with the GIL I guess as we explicitly release it just before this call.
Do you know if Flask has some custom behavior wrt to the GIL and how it reports it?

arthurdjn · November 9, 2020, 6:53pm

Hi,
Thanks for your answer.
I apologize for opening an issue both on PyTorch discuss and GitHub, I did not know which platform was the most adapted.
I will redirect this discussion to github issue 47575.

albanD · November 9, 2020, 6:56pm

Sounds good.
If you have any idea for my question above, it would be great if you could send an answer on the github issue!