Should learning rate changes according GPU number?

Hi,

on 1 GPU, using the parameters batch_size=32 and lr=0.0001, I get a good accuracy.
I would like to use 8 GPU to retrain the model.
My question is:
should be change lr to 0.0001*8 when I use 8 GPU?

1 Like

yes ofc LR should change (follow the linear rule in the paper): https://arxiv.org/pdf/1706.02677.pdf

1 Like