This is also something I have been confused about. I don’t understand how the number of GPUs should have any effect on batch size selection in DDP given that the specified batch size should be for each GPU/process. I would definitely appreciate some clarification if possible @mrshenli .
AMellinger:
mrshenli:
Can I use batch_size lower than number of GPUs, batch_size=4 for 8xGPUs (will it lead to error, or will be used only 4 GPUs or will be batch_size increased to 8 or 32)?
It should work, but will not fully utilize all devices. If batch_size=4
, IIUC, it can at most use 4 GPUs.
Greetings! I’d like some clarifications on this. Is this response referring to DP or DDP? If DDP then isn’t batch_size per-process? Meaning if one sets batch_size=4
in the DataLoader then isn’t that 4 sample per process/gpu? How does this turn into ‘it can at most use 4 GPUs?’
I guess I have always been confused by the DDP statement of “The batch size should be larger than the number of GPUs used locally.” because we are setting the batch_size per process/gpu not batch_size for the entire sets of gpus in aggregate. Or does “batch size” have two different meanings?
Thanks for any help!