Slowdown when running multiple processes doing inference in parallel

Prashant · December 4, 2020, 12:17am

Hi I have a torchscripted transformer model and I’m trying to evaluate the performance of my C++ inference code. To this end, I’m running on a server with 32 CPUs (and doing inference on CPU). I have set OMP_NUM_THREADS=1 and MKL_NUM_THREADS=1 but I still notice that running 30 inference processes in parallel causes each forward step to be significantly slower (up to 2x slower) than running inference jobs one at a time.

Is there something else I’m missing?

hoyden · April 12, 2022, 2:33am

@Prashant I met the same question.Did you solve it?