I have a deployment scenario where it is required that multiple python threads should be using the same model for inference.
How should I modify/structure the inference code so that multiple threads can use the same mode?
What I am observing is that some of the threads are failing with the following error
The expanded size of the tensor (100) must match the existing size (3) at non-singleton dimension 1.
This error does not happen when the code is run in a sequential manner.