I am working on an ensemble of deep cnns with 4 GPUs.
I would like to know whether is better/faster to train each model parallelized in the 4 devices one after the other sequentially, or train each model (suppose ensemble size = 4) on each GPU.
Note: All the individual networks are exposed to the entire dataset.
My current code is:
cuda = torch.cuda.is_available() device = 'cuda' if torch.cuda.is_available() else 'cpu' gpus = True if torch.cuda.device_count() > 1 else False # torch.cuda.device_count() = 4 ensemble =  optimizers =  for i in range(ensemble_size): model = ResNet() optimizers.append(optim.SGD(model.parameters(), learning_rate)) model.to(device) if gpus: model = nn.DataParallel(model) ensemble.append(model)
If it is better each model on each GPU, would this be correct?
if gpus: with torch.cuda.device(i): model.to(device) # model = nn.DataParallel(model) ## I can't parallelize the batches know right? ensemble.append(model)
model.to(device)am I parallelizing each model in the 4 GPUs of the
- I cannot make use of
nn.DataParallel(model)to distribute the batches on the GPUs since each GPU has to run the entire dataset for each model right?
Thanks a lot and sorry for the long post!