How to fully use multiple GPUs when training multiple networks


I am trying to train multiple neural networks on a machine with multiple GPUs. I have already used DataParallel module to parallelize this process. My understanding of DataParallel is that it can only help train each model one by one parallelly. However, all the GPUs are not fully utilized if I train these networks one by one.

I am wondering if there is any method to perform multiple networks training at the same time and fully use all my GPUs? I tried to think about using torch.multiprocessing. Yet it seems like pretty painful to use that in my case.

Any suggestions will be really appreciated. Thanks!

Are these multiple networks connected somehow to each other, i.e. model1 feeds its output to model2?
If so, could you post an explanation of your workflow?

If the networks are completely standalone models, you could run multiple scripts, specifying the GPU which should be used with: CUDA_VISIBLE_DEVICES=device_id python, where device_id has to be set to the appropriate GPU id.
You could also set the device in your script with:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

Thanks a lot for your response!

These networks are not connected to each other. They are completely standalone models. However, every once a while, I need to manage these models somehow. Basically, I am trying to use different types of optimizers to train multiple neural networks with same architectures, say, ResNet-50. After every 5 epochs, I need to observe the training accuracies for these models and then modify the optimizers or even delete some optimizers accordingly if they behave too bad.

Hence, I need to train all of them and keep track of the performance as well as the weights for these neural networks to be able to continue the same process every 5 epochs. Based on what you are saying, should I specify the GPU for each of them in my script and then collect the performance after every 5 epochs? For example, if I have 50 such networks and 10 GPUs, I then need to loop through them by running 50 / 10 = 5 times to gather all the results?

However, this method seems like a little bit dirty and hard to manage. I am wondering if there is any better method. Thanks again!