What does net.to(device) do in nn.DataParallel

The following code from the tutorial to pytorch data paraleelism reads strange to me:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = Model(input_size, output_size)
if torch.cuda.device_count() > 1:
  print("Let's use", torch.cuda.device_count(), "GPUs!")
  # dim = 0 [30, xxx] -> [10, ...], [10, ...], [10, ...] on 3 GPUs
  model = nn.DataParallel(model)


According to my best knowledge, mode.to(device) copy the data to GPU.

DataParallel splits your data automatically and sends job orders to multiple models on several GPUs. After each model finishes their job, DataParallel collects and merges the results before returning it to you.

If the DataParallel does the job of copying, what does the to(device) do here?

It moves the model weights to GPU.

If so, what does nn.DataParallel(model) do then?

On calling forward it splits the input into multiple chunks (one chunk per GPU), replicates the underlying model to multiple GPUs, runs forward on each of them, and gathers the outputs.

Thank you. I think I need to read more core code of pytorch to fully understand.