Suppose I have 4 GPUs, I use DataParallel and DistributedDataParallel respectively to train the model, and pass batch_size=64 in DataLoader. So what is the equivalent actual batchsize (when using a single GPU)?
For DistributedDataParallel
, you should use the DistributedSampler
in combination with the DataLoader
, or else each rank will process the same data, which will be equivalent to a single GPU processing a dataset that has the original dataset replicated 4 times.
I am not too familiar with DataParallel
, and we recommend trying DistributedDataParallel
if possible. If your question is only for conceptual understanding, I apologize that I do not know
1 Like
Thank you for your help