Let’s assume I train a model with a batch size of 64 on a single GPU. Now I want to train the model on multiple GPUs using
nn.DataParallel. If I keep all my parameters the same, I expect the two experiments to yield the same results.
But how do I have to specifiy the batch size to get the same results? Do I pass 64 and then every GPU (let’s assume I have 4) gets a batch size of 16 (64 / 4) or do I set the batch size to 4 * 64 and then every GPU gets 64? The documentation of
nn.DataParallel says it splits along the batch dimension, so I would assume that I have to set the batch size to 4 * 64 to get the same results. This goes in hand with this but then I find this which says I should keep the batch size at 64…
Can anyone clarify this topic? Any help is very much appreciated!
All the best