Basic operations do not work in 1.1.0 with uWSGI+Flask

lebionick · July 10, 2019, 1:21pm

Came across the problem, that in pytorch 1.1.0 with uwsgi+flask on cpu even torch.cat does not work (everything just freezes without errors). I determined that problem is in uwsgi/flask, because in the same environment I was able to do the same operations without any issues. There was no such problem in previous versions of pytorch. As far as I understand, I am dealing with multiprocessing artifacts…
Nevertheless I found the solution and it was simple:

app = flask.Flask(__name__)
segmentator = None

@app.before_first_request
def load_segmentator():
    global segmentator
    segmentator = Segmentator()

where Segmentator is a class with pytorch’s nn.Module, which loads weights in __init__
Hope, it will help somebody.

P.S. If someone explains what is going on here, will be grateful.

ijnf · August 2, 2019, 11:28am

I’m facing the same issue. i tried a similar solution but still doesn’t seem to work. Any more information about this would be appreciated

lebionick · August 2, 2019, 11:54am

To be honest I’m not really understand why it worked and cannot say much more in terms of code.
Here is my “code” inside Segmentator:

class Segmentator:
    def __init__(self, path):
        print('loading model')
        loaded_model = UNetWithResnet50Encoder(9)
        print('loading weights')
        checkpoint = torch.load(path, map_location='cpu')
        print('loading checkpoint')
        loaded_model.load_state_dict(checkpoint['model'])
        loaded_model.eval()
        print('loaded!')
        self.model = loaded_model

    def process(img_url):
        # loads image into RAM, preprocesses it,
        # passes trough self.model, and postprocesses output
        pass

ijnf · August 2, 2019, 12:14pm

@lebionick thanks for the info. I actually have the exact same project structure as your code. Segmentor just named differently I had another solution for another app but I changed the solution to be exactly like yours with the before_first_request still to no success. I will report back if I find out more about this.

ijnf · August 2, 2019, 3:18pm

ok it seems load_state_dict and other operations in __init__ when run after flask.Flask(__name__) cause pytorch operations done in requests to hang forever. Not sure about the cause but maybe this info could help someone

Khan1 · August 2, 2019, 4:08pm

Try this

keng000 · May 4, 2020, 11:25pm

Don’t you running the flask app with uWSGI’s “preforking worker mode” that is a basic config?
Try lazy-apps mode that each worker will load the trained model for them self and not share with others.
It works in my environment even I load model in the global.

command
uwsgi --lazy-app

or ini file

[uwsgi]
lazy-apps = true

minimum reproduction

randomwalk10 · July 12, 2020, 12:29pm

By setting lazy-apps to be true, it solves the problem I had with flask+uwsgi+pytorch deployment. Thx.

af_luther · March 6, 2021, 6:43pm

I found this thread really interesting because I am working on scaling my inference server (which uses uWSGI + flask + PyTorch on AWS with Elastic Inference), and when I increased the # of processes recently I came across some intermittent issues:

terminate called after throwing an instance of ‘c10::Error’
what(): [enforce fail at inline_container.cc:316] . PytorchStreamWriter failed writing central directory: file write failed

I couldn’t find any other references to this sort of error, but I am guessing it could be some kind of concurrency issue as I increased the number of processes.

alexandru-dinu · June 7, 2021, 1:27pm

I found a different solution that works in my case and still maintains the default fork model of uWSGI (i.e. without lazy-apps). Enabling lazy-apps may not desirable because worker processes will not share resources with their parent (e.g. read-only resources such as models used for inference).

The idea is to share only the state_dict between processes, and each worker process will do model.load_state_dict(global_state) independently, but only once, during the first request. This is opposed to doing load_state_dict in the parent process and sharing this with the worker processes.

I managed to replicate this behavior in this StackOverflow post.

I hope this helps others as well!

mneilly · May 5, 2022, 5:14pm

I just ran into a similar issue running with pytorch and uwsgi in a container. This issue (Launching two processes causes hanging · Issue #50669 · pytorch/pytorch · GitHub) indicates that using LD_PRELOAD to use Intel’s OMP instead of libgomp could avoid the issue which is in libgomp. On Debian Bullseye I was able to install libomp-dev-11 and LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libiomp5.so /usr/bin/uwsgi --ini /uwsgi.ini --uid www-data --enable-threads to workaround the issue.

FWIW - I created a github repo to demonstrate the issue and workaround here: GitHub - mneilly/pytorch-libgomp-hang-and-workaround-example: This is an example of the libgomp hang issue encountered in pytorch