Do DataParallel and DistributedDataParallel affect the batch size and GPU memory consumption?

This is also something I have been confused about. I don’t understand how the number of GPUs should have any effect on batch size selection in DDP given that the specified batch size should be for each GPU/process. I would definitely appreciate some clarification if possible @mrshenli. :slight_smile: