Weird loss curves using DistributedDataParallel

kaizhao · September 2, 2021, 7:35am

I’m converting a DataParallel (DP) model into DistributedDataParallel (DDP).

However, the DDP degradates the testing performance and leads to zigzag loss curves during training.

Below is the loss curve during training (blue is DDP and orange is DP):

In general these two behave similar. But when zooming in, the DDP presents periodic zigzags:

My training configurations are:

DP:
- batch size 48
- 3 GPUs
DDP
- batchsize per GPU: 16
- 3 GPUs

Random seeds are fixed and learning rate and all other configurations are the same.

Yanli_Zhao · September 7, 2021, 6:34pm

DP’s loss is aggregated for batch size 48, DDP’s loss is calculated independently for each batch size 16, so they are expected to be different if you are looking at losses at rank 0