Run n models in parallel on single GPU

I want to train n models (per n, I have f times t data points). I can load all data onto a single GPU. I assign the dataloader batches and each batch gets a number of minibatches. Each minibatch holds the data to train one model (one n). The data per n is rather small, but the number of models is large. The ‘problem’ that I am facing is that the batches are executed sequentially rather than in parallel on the single GPU. Is there a way to parallelize the batches on the single GPU to ensure scaling to a large number of models quicker? The bare, individual training time per model is improved by a factor of X using GPUs over CPU (not given any parallelization).

Thanks in advance.

I think you should be able to spawn multiple processes on a single GPU (using torch.multiprocessing - https://pytorch.org/docs/stable/multiprocessing.html), and train each model in a separate process. You may need to tune the number of processes you spawn, since performance may be degraded with too many processes due to resource contention.