DataParallel Tutorial usage of


I am new to torch and multi-GPU usage. I went through the Tutorial and I am confused by the usage of in the multi-GPU case. Removing some intermediate lines of code, we are left with something like the this:

import torch
import torch.nn as nn

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

all_available_devices = [0,1,2,3]
model = some_NN()
model = nn.DataParallel(model, device_ids=all_available_devices)

. Now, this last line of code is confusing me. Maybe my understanding of is not correct. In the single-GPU case allocates the model, gradients, and feature maps (in case of a CNN) to our device, correct? Now what does this line do in the multi-GPU case, where model is saved on all_available_devices?

In addition, in the single-GPU case I allocate the e.g. training data on a specific device (i.e. Can this impede the data flow that is handled by DataParallel() behind the curtains?


nn.DataParallel will “copy” model to multi-gpu automatically, .to(device) will load model to main device.

Thank you, and just to be clear: device from the above example would be GPU-1 in the graphic of this post?

And about the 2nd question: Following the graphic of the same post, is it also the fastest approach to allocate the data on the same device as the model?

It’s just a way to use mulit-gpu, nothing about fast or not.