Multithreaded inference

MRK · December 5, 2019, 3:46am

I have a deployment scenario where it is required that multiple python threads should be using the same model for inference.
How should I modify/structure the inference code so that multiple threads can use the same mode?

What I am observing is that some of the threads are failing with the following error
The expanded size of the tensor (100) must match the existing size (3) at non-singleton dimension 1.
This error does not happen when the code is run in a sequential manner.

marksaroufim · February 3, 2022, 7:10pm

Hi @MRK could you provide a bit more detail. Are you using an serving framework or id you create your own? Any code snippets you can share? Are you following instructions from this tutorial CPU threading and TorchScript inference — PyTorch 1.10 documentation