Train multiple models on multiple GPUs

(Boyu Zhang) #1

Is it possible to train multiple models on multiple GPUs where each model is trained on a distinct GPU simultaneously?

for example, suppose there are 2 gpus,

model1 = model1.cuda(0)
model2 = model2.cuda(1)

then train these two models simultaneously by the same dataloader.


It should work! You have to make sure the Variables/Tensors are located on the right GPU.
Could you explain a bit more about your use case?
Are you merging the outputs somehow or are the models completely independent from each other?

(Boyu Zhang) #3

Hi ptrblck, thanks for your reply. The models are completely independent from each other but in some training steps, the models would transfer information between each other. So I need to train these models simultaneously. BTW, if I want to train all the models simultaneously, how do I write the code? Currently, my code is like the following, but I guess the models are trained in a sequential manner,

model1 = model1.cuda(0)
model2 = model2.cuda(1)
models = [model1, model2]

for (input, label) in data_loader:
      for m in models:
           output = m(input)
           loss = criterion(output, label)


I think in your current implementation you would indeed have to wait until the optimization was done on each GPU.
If you just have two models, you could push each input and target tensor to the appropriate GPU and call the forward passes after each other.
Since these calls are performed asynchronously, you could achieve a speedup in this way.
The code should look like this:

input1 ='cuda:0')
intput2 ='cuda:1')
# same for label

outpu1 = model1(intput1) # should be an asynch call
outpu2 = model2(intput2)

Unfortunately I cannot test it at the moment. Would you run it and check if it’s suitable for your use case?


Hi !
I am still interested in the topic. I am very new to Pytorch and currently would like to perform parallel training of different models on different GPUs (i.e. one model/GPU) for hyperparameter search or simply to get results for different weight initializations. I know there is a lot of documentation pertaining to multiprocessing and existing frameworks for hyperparameter tuning which I already checked, however I only have a limited amount of time and thus on the look out for the very simplest way to achieve this. It would be extremely helpful, thank you for your attention.

(Boyu Zhang) #6

You can look at Horovod which is developed by UBer. It makes parallel training extremely easy.

Maxence_Ernoult via PyTorch Forums noreply@discuss.pytorch.org于2019年3月15日 周五上午3:34写道: