Effects of increasing Batch_Size on Accuracy of Model

I have more of a conceptual question. When training a NN model (especially through DDP) on single-node, multiple-GPUs, there is an urge to utilize the maximum GPU memory. This is usually done by increasing the training Batch_Size of input samples (e.g., from 32 to 64, 128, 256, 512 or even 1024).

So I have three questions:

  • Is there any change in terms of validation accuracy (and loss) when we do the training on more # of GPUs?

  • If I increase the Batch_size to utilize the GPU memory, is there any effect on validation accuracy and loss?

  • Do we need to tune the other hyperparameters (like learning rate, weight decay, epochs, etc.) when changing the Batch_size in the above scenario?


Response to your questions:

  1. Yes potentially, DDP keeps loss local to each model and averages gradients, which might produce different result compared to having a global loss and back-propagating from there.
  2. Also potentially yes, batch_size is another hyperparameter that can be tuned.
  3. As a general note it is good to re-tune all hyperparameters when switching to DDP. These hyperparameters can all affect the accuracy of the model when compared to single GPU. However difference in accuracy should not be very drastic (this is subjective), if that is the case then it requires additional investigation.