How can I check whether DDP is working properly or not?
I compared the speed in the environment below, and DDP was 2 times slower than DP.
Can such a case exist? Is it a code problem?
:+ Is all_reduce mandatory?
- Train dataset: 3,000 images
- ResNet-101
- Batch_size: 32
- 2 GPUs
- workers: 8