What is the local batch size when using DistributedSampler?

Olivier-CR · December 29, 2021, 6:53pm

When doing data_loader = DataLoader(my_dataset, sampler=DistributedSampler(dataset), batch_size=N) in a DDP distributed training script, what is the number of records each GPU/worker/process/script (unsure what is the most accepted name) receives at each iteration?

Does each DDP GPU receive N records, or N/gpu_count?

huahuanZ · December 31, 2021, 6:32am

The batch size on local workers would be N_global_batch_size//N_workers

Olivier-CR · January 3, 2022, 9:54am

thanks! and N_global_batch_size would be the value you set as batch_size in the DataLoader ?

huahuanZ · January 7, 2022, 6:35am

No. You need to manually set batch_size=N_global_batch_size//N_workers in the DataLoader. N_global_batch_size is the real batch size.