The hyperparameters settings when apply the ddp

CBGxd · June 23, 2021, 12:28am

Hi guys,

I don’t know someone asked it before or not, but I really wanna to make sure everything I did is correct. Says we have learning rate lr, epoch e, and batch size b as normal setting. And now, we apply it to the ddp based on 2 gpus:
1). If we wanna the batch size keep unchanged by compared with the single card, we don’t change the lr and b = b/2
2). If we wanna apply double batch size, given we have 2 gpus, we don’t need to modify the b, but lr = lr * 2 because of the larger batch size.
3). Should we modify the epoch number?
4). I’m doing a semi-supervised learning, which including two losses (i.e., supervised loss and semi loss). During training, there is a weight that apply for the semi-supervised loss, and we don’t need to change it right?

Cheers,

Yanli_Zhao · June 23, 2021, 3:48pm

and 2) are correct.
if you have 1) setting, I guess you do not need to modify the epoch number; if you have 2) setting, I guess you may need to adjust the number a little bit as it trains faster?

CBGxd · June 24, 2021, 1:21am

That’s wired. When I double up the batch size, the convergence speed of the algorithm turns down… I’m testing the b=4 and b=8 (per gpu) for two cards. After the first epoch, the smaller batch size always archives better accuracy. Is that normal?

CBGxd · June 24, 2021, 3:24am

Thanks so much for you help.

After the first epoch, the smaller batch size always archives better accuracy.

Do you have any suggestion with this?

Yanli_Zhao · June 28, 2021, 8:56pm

batch size will impact accuracy, this is normal

CBGxd · June 30, 2021, 5:46am

Yes, the final result get improved a lot.