How does PyTorch utilize multiple GPU?

Hi guys, I was reading this DataParallel tutorial and I have several questions, suppose I have two gpus: gpu A and gpu B:

  1. How is the model allocated on A and B? Based on my experiments it seems that a copy of the model will be made on both A and B. That is, if my model has size 1G, then both A and B will use 1G memory. Is this correct?
  2. How is the data allocated on A and B? From the tutorial, it seems you only need to use nn.DataParallel on the model, and not on the data?
  3. How does batch size affect memory allocation? My gpu can allocate using a small batch size (say 10), but run out of memory on large batch size (say 30).

Appreciate any answers/references. Thanks.