Hi,
I want to set the batch size to 128.
When I use 8 gpus ( 4 gpus / node and 2 nodes ) - so world_size=8, ngpus_per_node=4 -
what number do I need to set to batch_size= in DataLoader?
(I run torchrun with DDP.)
128/world_size = 16 or 128/ngpus_per_node = 32 or 128 ?
Thank you!