How can I check whether DDP is working properly or not?

How can I check whether DDP is working properly or not?

I compared the speed in the environment below, and DDP was 2 times slower than DP.
Can such a case exist? Is it a code problem?
:+ Is all_reduce mandatory?

  • Train dataset: 3,000 images
  • ResNet-101
  • Batch_size: 32
  • 2 GPUs
  • workers: 8

you can compare DDP training convergency or evaluate its accuracy or loss.

in terms of performance, 2 times slower is a little surprised… depends on your codes and hardware