Runtime error in multithreaded inference scenario

I am trying to run a pytorch model for inference in a multithreaded scenario where multiple threads try to predict using the shared model.

In a non-concurrent case, inference using the model runs perfectly fine but I start getting the following error occasionally for different threads

RuntimeError: The expanded size of the tensor (100) must match the existing size (3) at non-singleton dimension 1. Target sizes: [40, 100, 300]. Tensor sizes: [16, 3, 1]

I have experimented a little bit and having a mutual exclusion on the prediction solves the problem but that defeats the purpose of having multiple threads.

I have some questions

  1. What is the recommended way to have concurrent threads sharing the same model?
  2. Do I need to change any configuration related settings?