I want to train multiple independent models on GPU using multiprocessing. The data is the same for all the processes. A fork
method uses Copy-On-Write where if the data is read-only (as is mine), the child processes will not make a copy of the data. From the multiprocessing documentation, I see that CUDA only supports spawn
or forkserver
methods. However, with these methods, the child process makes separate copies of the data and the memory increases linearly with the number of processes.
How I can have common data used by all the child workers while training on GPU? There is no communication between the processes and the data is read-only. If I don’t use GPU, the fork
method works fine, but obviously I’m interested in making it work on the GPU.
I haven’t used nn.parallel.DistributedDataParallel
but I believe it is more suitable for training a single model and distributing the batches across different devices. My use case is different where I am training separate models on single/multiple GPUs and the training data need not be distributed. Is it possible use nn.parallel.DistributedDataParallel
here?