Using DataParallel in jupyter notebook

Hi, I try to use dataparallel in jupyter notebook and I was following the instruction Optional: Data Parallelism — PyTorch Tutorials 1.8.1+cu102 documentation. It’s not clear to me how the multiple gpu version works. On top of the page it writes device = torch.device("cuda:0"), isn’t it sending all tensors to the first gpu? Then later it writes, which also sends to the first gpu. Thanks for any clarification.

nn.DataParallel will allocate the model, inputs, and outputs on the default device (or whichever is used in the to() operation) and will then scatter and gather the parameters, data, gradients etc. as described in this blog post.
The communication overhead could cause a slowdown compared to DistributedDataParallel using a single process per GPU, which is why we generally recommend to use the latter approach.