I designed a model for object detection and trained on one gpu, the loss is decreasing and got a good result.
But, when I was trying to train the same network on multi-gpus, the loss is not decreasing. And more strange thing is that, the loss curve is exactly the same not matter the lr is 0.05 or 0.0005.
The dataset and training pipeline are used for other models and working good on multi-gpus.
Anybody got an idea, what the potential reason could be?
If you need more details, just write on the comment.