Run a torch model inside a parallelized loop

Hi everyone,

I run a python loop which is parallelized with multiprocessing lib. Each iteration of the loop calls a trained torch model so as to make a prediction. It all works on CPU.

    for _ind in inds:
        # process each sample
        f = pool.apply_async(function_launching_torch_model,

It appears that the torch model is very slow, much more than when the loop is sequential. I guess there’s a “conflict” in the use of the procs?
Anyone would have a suggestion to keep a parallel loop while getting good performance?

Thnaks in advance !