How to serve same dataset to models on each gpu?

In the same machine, there are four different models on each gpu.
They are all use same dataset, but i don’t know how to serve it efficiently.

Naively I could make four copies of the dataset, and create four python process to train them.
But Is it the best way? Are all four same dataset have to be duplicated on the memory?..

If those dataset won’t be mutated, then they don’t have to be duplicated.

Naively I could make four copies of the dataset, and create four python process to train them.
But Is it the best way?

This approach has pros and cons. It will certainly use more memory on duplicated dataset and per-process CUDA context (500MB CUDA memory). But the benefit is there won’t be GIL contentions. Another option could be using multi-thread with multiple CUDA streams. In this way, data set and CUDA context can be shared, and multi-thread will also allow concurrent processing on CUDA devices. But the CPU computations from multiple threads will still compete to grab the GIL.

1 Like