Hi guys, I was reading this DataParallel tutorial https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html and I have several questions, suppose I have two gpus: gpu A and gpu B:
- How is the model allocated on A and B? Based on my experiments it seems that a copy of the model will be made on both A and B. That is, if my model has size 1G, then both A and B will use 1G memory. Is this correct?
- How is the data allocated on A and B? From the tutorial, it seems you only need to use nn.DataParallel on the model, and not on the data?
- How does batch size affect memory allocation? My gpu can allocate using a small batch size (say 10), but run out of memory on large batch size (say 30).
Appreciate any answers/references. Thanks.