Problems with batchsize when using DataParallel and DistributedDataParallel

Suppose I have 4 GPUs, I use DataParallel and DistributedDataParallel respectively to train the model, and pass batch_size=64 in DataLoader. So what is the equivalent actual batchsize (when using a single GPU)?

For DistributedDataParallel, you should use the DistributedSampler in combination with the DataLoader, or else each rank will process the same data, which will be equivalent to a single GPU processing a dataset that has the original dataset replicated 4 times.

I am not too familiar with DataParallel, and we recommend trying DistributedDataParallel if possible. If your question is only for conceptual understanding, I apologize that I do not know :frowning:

1 Like

Thank you for your help :blush: