Came across the problem, that in pytorch 1.1.0 with uwsgi+flask on cpu even
torch.cat does not work (everything just freezes without errors). I determined that problem is in uwsgi/flask, because in the same environment I was able to do the same operations without any issues. There was no such problem in previous versions of pytorch. As far as I understand, I am dealing with multiprocessing artifacts…
Nevertheless I found the solution and it was simple:
app = flask.Flask(__name__)
segmentator = None
segmentator = Segmentator()
Segmentator is a class with pytorch’s
nn.Module, which loads weights in
Hope, it will help somebody.
P.S. If someone explains what is going on here, will be grateful.
I’m facing the same issue. i tried a similar solution but still doesn’t seem to work. Any more information about this would be appreciated
To be honest I’m not really understand why it worked and cannot say much more in terms of code.
Here is my “code” inside
def __init__(self, path):
loaded_model = UNetWithResnet50Encoder(9)
checkpoint = torch.load(path, map_location='cpu')
self.model = loaded_model
# loads image into RAM, preprocesses it,
# passes trough self.model, and postprocesses output
@lebionick thanks for the info. I actually have the exact same project structure as your code.
Segmentor just named differently I had another solution for another app but I changed the solution to be exactly like yours with the
before_first_request still to no success. I will report back if I find out more about this.
ok it seems
load_state_dict and other operations in
__init__ when run after
flask.Flask(__name__) cause pytorch operations done in requests to hang forever. Not sure about the cause but maybe this info could help someone
Don’t you running the flask app with uWSGI’s “preforking worker mode” that is a basic config?
lazy-apps mode that each worker will load the trained model for them self and not share with others.
It works in my environment even I load model in the global.
or ini file
lazy-apps = true
By setting lazy-apps to be true, it solves the problem I had with flask+uwsgi+pytorch deployment. Thx.
I found this thread really interesting because I am working on scaling my inference server (which uses uWSGI + flask + PyTorch on AWS with Elastic Inference), and when I increased the # of processes recently I came across some intermittent issues:
terminate called after throwing an instance of ‘c10::Error’
what(): [enforce fail at inline_container.cc:316] . PytorchStreamWriter failed writing central directory: file write failed
I couldn’t find any other references to this sort of error, but I am guessing it could be some kind of concurrency issue as I increased the number of processes.
I found a different solution that works in my case and still maintains the default fork model of uWSGI (i.e. without
lazy-apps may not desirable because worker processes will not share resources with their parent (e.g. read-only resources such as models used for inference).
The idea is to share only the
state_dict between processes, and each worker process will do
model.load_state_dict(global_state) independently, but only once, during the first request. This is opposed to doing
load_state_dict in the parent process and sharing this with the worker processes.
I managed to replicate this behavior in this StackOverflow post.
I hope this helps others as well!
I just ran into a similar issue running with pytorch and uwsgi in a container. This issue (Launching two processes causes hanging · Issue #50669 · pytorch/pytorch · GitHub) indicates that using LD_PRELOAD to use Intel’s OMP instead of libgomp could avoid the issue which is in libgomp. On Debian Bullseye I was able to install libomp-dev-11 and
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libiomp5.so /usr/bin/uwsgi --ini /uwsgi.ini --uid www-data --enable-threads to workaround the issue.
FWIW - I created a github repo to demonstrate the issue and workaround here: GitHub - mneilly/pytorch-libgomp-hang-and-workaround-example: This is an example of the libgomp hang issue encountered in pytorch