Batch size and DistributedDataParallel


My question is that: does the DataLoader batch_size parameter actually represents the batch_size per process (or per GPU) in Pytorch DistributedDataParallel? Or is DataLoader batch_size parameter divided by the world_size and each get a subportion of that batch size?

According to my implementation for 1 node with 3 GPUs, when I set batch_size to 1, then each process loads the input with a batch dimension as 1, while when I change batch_size to 3, then each process loads the input with batch dimension as 3. On the second approach, should the batch size have been divided on each process/GPU and each get input with batch size dimension as 1?

I am a beginner in PyTorch and just wanted to make sure that my implementation working this way is correct. My model architecture runs with only batch_size=1, and also wanted to make sure that each GPU loads the data correctly and on a node level, I will then end up with batch_size of 3.

I believe this is similar to the ImageNet example, which divides the total batch size by the number of GPUs:

examples/ at main ยท pytorch/examples (