I am currently using PyTorch to build an image search engine and I am using Flask to serve the model. Currently I have one instance of the model and when a user send a request the server will use the model as a global variable.
I am just wondering if a pytorch model is thread safe or would it be necessary to use a Mutex when I run the model since another thread might be using it at the same time?
In general, a CPU model should be thread safe (there are some exceptions though; some people report involving functions that use MKL with multiprocessing causes hanging). If you’re running CUDA models on one GPU, you will get better performance by not running multiple models at the same time, so it would be good to run a mutex here. If you’re running CUDA models on multiple GPUs, that will probably deadlock due to the nccl backend.
Hi @richard, no the server has no GPU so it will be running on CPU only, I will remove the mutex for now and see over time if users have any issues. Thanks for your help