Is there a way to train independent models in parallel using the same dataloader?

I’m training multiple models using the same datasets. Currently I simply write separate scripts for these models and train them on a single GPU. But as they are using the same dataset, I think my current way of doing things will create a lot overhead on the dataloading part.

So I’m just wondering if there is a way to train multiple models under the same dataloader. An obvious way is to apply models in a sequential way inside the same dataloader iteration, but would it make use of my gpu efficiently? My naive guess is that if multiple models can be run in a parallel fashion under inside the same dataloader iteration then that would fully make use my single GPU.

If you are worried about host-device/device-host data copy from one model blocks computation from another model, you can try using multiple CUDA streams, one stream per model. Operations in different streams can run in parallel.

No, you’d only amortize data loading time. May be worth it if it is a notable proportion of iteration time (data loading+forward+backward).

Yes, but they won’t run forward() in parallel, unless you write code for it. Here, first problem is you would need num_models times more gpu memory (probably more due to increased fragmentation); another complications include python GIL and need for cuda streams.