Serving PyTorch model on Flask: Thread safety

fcaldas · February 22, 2018, 1:06pm

Hi,

I am currently using PyTorch to build an image search engine and I am using Flask to serve the model. Currently I have one instance of the model and when a user send a request the server will use the model as a global variable.

I am just wondering if a pytorch model is thread safe or would it be necessary to use a Mutex when I run the model since another thread might be using it at the same time?

richard · February 22, 2018, 8:03pm

Are you planning to run your model(s) on a GPU?

In general, a CPU model should be thread safe (there are some exceptions though; some people report involving functions that use MKL with multiprocessing causes hanging). If you’re running CUDA models on one GPU, you will get better performance by not running multiple models at the same time, so it would be good to run a mutex here. If you’re running CUDA models on multiple GPUs, that will probably deadlock due to the nccl backend.

fcaldas · February 23, 2018, 9:09am

Hi @richard, no the server has no GPU so it will be running on CPU only, I will remove the mutex for now and see over time if users have any issues. Thanks for your help

ahkarami · July 10, 2018, 7:43am

I think these links also would be helpful:
Deep-Learning-in-Production
WebDNN
Serve Models on Web