Thread pool exectuor + PyTorch

Has anyone used Thread Pool Executor with Pytorch?

Details:

  • I am using a single GPU (Nvidia/MPS)
  • I am trying to predict around 80 parameters (using 80 similar models). I have found that my current model and training workflow are not fully utilising the GPU.
  • Using RayTune / and simply just running the script multiple times, I have noted that the total runtime would be less, if I could run multiple training loops at once, probably only around 2, maybe 3.
  • In my production environment, I cannot package RayTune, so using it is not an option.
  • I would ordinarily use Process Pool Executor, but this isn’t a possibility due to the GUI framework that I am using.

Is it possible to use Thread Pool Executor to create multiple training loops at the same time? (All the tasks are independent, and none rely on another)

Can confirm - both approaches work.

However, Thread Pool Executor results in limited performance improvements (as expected given the GIL).

Process Pool executor is the way to go.

The results are particularly pronounced on Apple silicon when using MPS, the efficiency gains are nearly 4x running 4 fairly sizeable models in parallel.