I have more of a conceptual question. When training a NN model (especially through DDP) on single-node, multiple-GPUs, there is an urge to utilize the maximum GPU memory. This is usually done by increasing the training Batch_Size of input samples (e.g., from 32 to 64, 128, 256, 512 or even 1024).
So I have three questions:
Is there any change in terms of validation accuracy (and loss) when we do the training on more # of GPUs?
If I increase the Batch_size to utilize the GPU memory, is there any effect on validation accuracy and loss?
Do we need to tune the other hyperparameters (like learning rate, weight decay, epochs, etc.) when changing the Batch_size in the above scenario?