Hi,
I have more of a conceptual question. When training a NN model (especially through DDP) on single-node, multiple-GPUs, there is an urge to utilize the maximum GPU memory. This is usually done by increasing the training Batch_Size of input samples (e.g., from 32 to 64, 128, 256, 512 or even 1024).
So I have three questions:
-
Is there any change in terms of validation accuracy (and loss) when we do the training on more # of GPUs?
-
If I increase the Batch_size to utilize the GPU memory, is there any effect on validation accuracy and loss?
-
Do we need to tune the other hyperparameters (like learning rate, weight decay, epochs, etc.) when changing the Batch_size in the above scenario?
Thanks.