Problems with batchsize when using DataParallel and DistributedDataParallel

SCN-THU · March 7, 2023, 9:20am

Suppose I have 4 GPUs, I use DataParallel and DistributedDataParallel respectively to train the model, and pass batch_size=64 in DataLoader. So what is the equivalent actual batchsize (when using a single GPU)?

agu · March 7, 2023, 10:57pm

For DistributedDataParallel, you should use the DistributedSampler in combination with the DataLoader, or else each rank will process the same data, which will be equivalent to a single GPU processing a dataset that has the original dataset replicated 4 times.

I am not too familiar with DataParallel, and we recommend trying DistributedDataParallel if possible. If your question is only for conceptual understanding, I apologize that I do not know

SCN-THU · March 8, 2023, 1:54am

Thank you for your help