Hi all,
I am working on an ensemble of deep cnns with 4 GPUs.
I would like to know whether is better/faster to train each model parallelized in the 4 devices one after the other sequentially, or train each model (suppose ensemble size = 4) on each GPU.
Note: All the individual networks are exposed to the entire dataset.
My current code is:
cuda = torch.cuda.is_available()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
gpus = True if torch.cuda.device_count() > 1 else False
# torch.cuda.device_count() = 4
ensemble = []
optimizers = []
for i in range(ensemble_size):
model = ResNet()
optimizers.append(optim.SGD(model.parameters(), learning_rate))
model.to(device)
if gpus: model = nn.DataParallel(model)
ensemble.append(model)
If it is better each model on each GPU, would this be correct?
if gpus:
with torch.cuda.device(i):
model.to(device)
# model = nn.DataParallel(model) ## I can't parallelize the batches know right?
ensemble.append(model)
- With
model.to(device)
am I parallelizing each model in the 4 GPUs of thedevice
? - I cannot make use of
nn.DataParallel(model)
to distribute the batches on the GPUs since each GPU has to run the entire dataset for each model right?
Thanks a lot and sorry for the long post!