Exactly same loss curve for different LR on multi-gpus

I designed a model for object detection and trained on one gpu, the loss is decreasing and got a good result.

But, when I was trying to train the same network on multi-gpus, the loss is not decreasing. And more strange thing is that, the loss curve is exactly the same not matter the lr is 0.05 or 0.0005.

The dataset and training pipeline are used for other models and working good on multi-gpus.

Anybody got an idea, what the potential reason could be?

If you need more details, just write on the comment.

Hi @Shangyin_Gao,

What’s the single-GPU LR? Did you also try to scale it such as LR = LR * num_gpus?

for the single I use 0.02, and for 2 gpus I tried LR from 0.0005 to 0.05