Multi GPU training, Data parallel vs Apex ddp for Semantic segmentation

henry_Kang · May 28, 2020, 5:05am

Hello. I am currently using the 2 GPU machines in the lab.
The first one, it has 2 Titan V.
Second, the other has 4 Titan V.
When I train the same dataset but 2 times bigger batch size but 2 titan V result is better.
The network is Mobilenet V3 based network.
I have read some articles, and they said that Sync batch normalization could be helpful.
I use the Apex version of Sync batch normalization.
Preformatted textmodel = apex.parallel.convert_syncbn_model(model)
However, still, 2 GPUs have a better result.
So I am looking for another article. They said in some cases, the DDP is required if I use a lot of GPU.
Does anyone have the same experience when you use multi GPU training?

The second thought is the multi-scale training, I read other published paper with code. They use multi-scale training with multiple GPUs. Does this effects the segmentation result?
Thank you.

ptrblck · May 28, 2020, 6:16am

Did you play around with some hyperparameters, e.g. did you try to lower the learning rate for the multi-GPU setup?
I assume your model doesn’t converge that well using 4 GPUs compared to the 2GPU run?

henry_Kang · May 28, 2020, 4:42pm

I use the same hyper-parameters, same loss function, same model same initialization technique…
My dataset as you remember that it is quite imbalanced data.
For this reason, I use the Focal-Tversky loss(in my current experience it gives the best result).
However, if I use more number of GPU should I reduce the lower learning rate?
Could you teach me why I should reduce the learning rate?
Thank you for your answer.

ptrblck · May 29, 2020, 12:06am

I’ve seen some experiments for large scale systems, where the learning rate was adapted to the batch size as seen in Training ImageNet in 1 hour.
However, this effect should be much smaller in your setup. Might still be worth a try to lower it and see, if it changes the convergence.

henry_Kang · May 29, 2020, 5:44pm

Yes I will try. These days, I realize that experience is very important about this field for optimization.
Thank you.