Losses in different ranks donot occur simultaneously when training with ddp

Hi, I’m now training a point cloud analysis model with distributed data parallel (ddp). I follow all the rules to create the ddp training. It is normal at the beginning, but the losses in different gpus donot occur consistently when it comes to the end of the 1st epoch, for example:

come to epoch: 0, step: 429, loss: 0.046092418071222385
come to epoch: 0, step: 429, loss: 0.046092418071222385
come to epoch: 0, step: 429, loss: 0.046092418071222385
come to epoch: 0, step: 429, loss: 0.046092418071222385
come to epoch: 0, step: 430, loss: 0.04677124587302715
come to epoch: 0, step: 430, loss: 0.04677124587302715
come to epoch: 0, step: 430, loss: 0.04677124587302715
come to epoch: 1, step: 0, loss: 0.04677124587302715
come to epoch: 0, step: 431, loss: 0.03822317679159113
come to epoch: 0, step: 431, loss: 0.03822317679159113
come to epoch: 1, step: 1, loss: 0.03822317679159113
come to epoch: 0, step: 431, loss: 0.03822317679159113
come to epoch: 0, step: 432, loss: 0.0431362825095357
come to epoch: 0, step: 432, loss: 0.0431362825095357
come to epoch: 0, step: 432, loss: 0.0431362825095357
come to epoch: 1, step: 2, loss: 0.0431362825095357
come to epoch: 0, step: 433, loss: 0.04170320917830233
come to epoch: 1, step: 3, loss: 0.04170320917830233
come to epoch: 0, step: 433, loss: 0.04170320917830233
come to epoch: 0, step: 433, loss: 0.04170320917830233
come to epoch: 0, step: 434, loss: 0.042295407038902666
come to epoch: 0, step: 434, loss: 0.042295407038902666
come to epoch: 1, step: 4, loss: 0.042295407038902666
come to epoch: 0, step: 434, loss: 0.042295407038902666
come to epoch: 0, step: 435, loss: 0.040262431528578634
come to epoch: 1, step: 5, loss: 0.040262431528578634
come to epoch: 0, step: 435, loss: 0.040262431528578634
come to epoch: 0, step: 435, loss: 0.040262431528578634
come to epoch: 0, step: 436, loss: 0.04188207677967013
come to epoch: 0, step: 436, loss: 0.04188207677967013
come to epoch: 0, step: 436, loss: 0.04188207677967013
come to epoch: 1, step: 6, loss: 0.04188207677967013

Can anyone help?

Thanks.

I think it is because the data partitioned to each gpu is not even.