I am new to torch and multi-GPU usage. I went through the Tutorial and I am confused by the usage of
model.to(device) in the multi-GPU case. Removing some intermediate lines of code, we are left with something like the this:
import torch import torch.nn as nn device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") all_available_devices = [0,1,2,3] model = some_NN() model = nn.DataParallel(model, device_ids=all_available_devices) model.to(device)
. Now, this last line of code is confusing me. Maybe my understanding of
model.to() is not correct. In the single-GPU case
model.to(device) allocates the model, gradients, and feature maps (in case of a CNN) to our
device, correct? Now what does this line do in the multi-GPU case, where
model is saved on
In addition, in the single-GPU case I allocate the e.g. training data on a specific device (i.e.
data.to(device)). Can this impede the data flow that is handled by
DataParallel() behind the curtains?