Hey,
I am new to torch and multi-GPU usage. I went through the Tutorial and I am confused by the usage of model.to(device)
in the multi-GPU case. Removing some intermediate lines of code, we are left with something like the this:
import torch
import torch.nn as nn
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
all_available_devices = [0,1,2,3]
model = some_NN()
model = nn.DataParallel(model, device_ids=all_available_devices)
model.to(device)
. Now, this last line of code is confusing me. Maybe my understanding of model.to()
is not correct. In the single-GPU case model.to(device)
allocates the model, gradients, and feature maps (in case of a CNN) to our device
, correct? Now what does this line do in the multi-GPU case, where model
is saved on all_available_devices
?
In addition, in the single-GPU case I allocate the e.g. training data on a specific device (i.e. data.to(device)
). Can this impede the data flow that is handled by DataParallel()
behind the curtains?
Cheers