Is there a way to verify if allreduce operation is getting called in a multinode DDP training with nccl backend ? In my training the results of single node and distributed training appear similar. @mrshenli @apaszke
One option is to use
In my training the results of single node and distributed training appear similar.
You mean speed is similar? What is the batch size fed into each DDP instance? When using DDP, the
batch_size should be updated to
No. I have divided the batch size by the world size.
Will check out nvprof and also create minimal working example as I can not share code.