I want to train n models (per n, I have f times t data points). I can load all data onto a single GPU. I assign the dataloader batches and each batch gets a number of minibatches. Each minibatch holds the data to train one model (one n). The data per n is rather small, but the number of models is large. The ‘problem’ that I am facing is that the batches are executed sequentially rather than in parallel on the single GPU. Is there a way to parallelize the batches on the single GPU to ensure scaling to a large number of models quicker? The bare, individual training time per model is improved by a factor of X using GPUs over CPU (not given any parallelization).
Thanks in advance.