The hyperparameters settings when apply the ddp

Hi guys,

I don’t know someone asked it before or not, but I really wanna to make sure everything I did is correct. Says we have learning rate lr, epoch e, and batch size b as normal setting. And now, we apply it to the ddp based on 2 gpus:
1). If we wanna the batch size keep unchanged by compared with the single card, we don’t change the lr and b = b/2
2). If we wanna apply double batch size, given we have 2 gpus, we don’t need to modify the b, but lr = lr * 2 because of the larger batch size.
3). Should we modify the epoch number?
4). I’m doing a semi-supervised learning, which including two losses (i.e., supervised loss and semi loss). During training, there is a weight that apply for the semi-supervised loss, and we don’t need to change it right?

Cheers,

  1. and 2) are correct.
  2. if you have 1) setting, I guess you do not need to modify the epoch number; if you have 2) setting, I guess you may need to adjust the number a little bit as it trains faster?
1 Like

That’s wired. When I double up the batch size, the convergence speed of the algorithm turns down… I’m testing the b=4 and b=8 (per gpu) for two cards. After the first epoch, the smaller batch size always archives better accuracy. Is that normal?

Thanks so much for you help.

After the first epoch, the smaller batch size always archives better accuracy.

Do you have any suggestion with this?

batch size will impact accuracy, this is normal

Yes, the final result get improved a lot.