Do DataParallel and DistributedDataParallel affect the batch size and GPU memory consumption?

ishaan-mehta · April 14, 2022, 7:08pm

This is also something I have been confused about. I don’t understand how the number of GPUs should have any effect on batch size selection in DDP given that the specified batch size should be for each GPU/process. I would definitely appreciate some clarification if possible @mrshenli.

AMellinger:

mrshenli:

Can I use batch_size lower than number of GPUs, batch_size=4 for 8xGPUs (will it lead to error, or will be used only 4 GPUs or will be batch_size increased to 8 or 32)?

It should work, but will not fully utilize all devices. If batch_size=4, IIUC, it can at most use 4 GPUs.

Greetings! I’d like some clarifications on this. Is this response referring to DP or DDP? If DDP then isn’t batch_size per-process? Meaning if one sets batch_size=4 in the DataLoader then isn’t that 4 sample per process/gpu? How does this turn into ‘it can at most use 4 GPUs?’

I guess I have always been confused by the DDP statement of “The batch size should be larger than the number of GPUs used locally.” because we are setting the batch_size per process/gpu not batch_size for the entire sets of gpus in aggregate. Or does “batch size” have two different meanings?

Thanks for any help!