Suppose I have 4 GPUs, I use DataParallel and DistributedDataParallel respectively to train the model, and pass batch_size=64 in DataLoader. So what is the equivalent actual batchsize (when using a single GPU)?
DistributedDataParallel, you should use the
DistributedSampler in combination with the
DataLoader, or else each rank will process the same data, which will be equivalent to a single GPU processing a dataset that has the original dataset replicated 4 times.
I am not too familiar with
DataParallel, and we recommend trying
DistributedDataParallel if possible. If your question is only for conceptual understanding, I apologize that I do not know
Thank you for your help