What does net.to(device) do in nn.DataParallel

tengerye · April 25, 2019, 1:36am

The following code from the tutorial to pytorch data paraleelism reads strange to me:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)

model.to(device)

According to my best knowledge, mode.to(device) copy the data to GPU.

DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

If the DataParallel does the job of copying, what does the to(device) do here?

pietern · June 24, 2019, 11:10am

It moves the model weights to GPU.

tengerye · July 2, 2019, 7:30am

If so, what does nn.DataParallel(model) do then?

pietern · July 2, 2019, 9:10am

On calling forward it splits the input into multiple chunks (one chunk per GPU), replicates the underlying model to multiple GPUs, runs forward on each of them, and gathers the outputs.

tengerye · July 3, 2019, 2:21am

Thank you. I think I need to read more core code of pytorch to fully understand.