Multi GPU training vs Training on GPUs separately

I am training a model on miniImageNet and have access to a machine with two GPUs.

Should I develop a script allowing me to train on two GPUs or train on each GPU separately? My options are to train on a single model using multi-GPU training or train different models on different GPUs in parallel.

Regarding training in parallel with two GPUs how do I select the devices such that Model A will train on GPU 0 and Model B will train on GPU 1?

I recommend starting with reading this article from PyTorch docs, as it gives insight into what’s available. I personally prefer the “Data Parallel Training” approach, but also tried splitting the model between devices.

When dealing with a machine having multiple GPUs you select them either with device='cuda:0', 'cuda:1', etc. or torch.device('cuda', index=0) (I prefer the latter, but YMMV).

1 Like