Distributed Data Parallel allreduce

Sourabh_Daptardar · September 15, 2020, 12:21pm

Is there a way to verify if allreduce operation is getting called in a multinode DDP training with nccl backend ? In my training the results of single node and distributed training appear similar. @mrshenli @apaszke

mrshenli · September 15, 2020, 2:28pm

One option is to use nvprof.

In my training the results of single node and distributed training appear similar.

You mean speed is similar? What is the batch size fed into each DDP instance? When using DDP, the batch_size should be updated to original_batch_size/world_size.

Sourabh_Daptardar · September 15, 2020, 6:53pm

No. I have divided the batch size by the world size.
Will check out nvprof and also create minimal working example as I can not share code.