DDP without Distributed Sampler

What happens when we do not give a distributed sampler? Does it essentially iterate over all samples with as many ranks as we have?

If the data is not sharded across different DDP ranks (i.e. with a distributed sampler or some custom sharding logic that you may have), then yes, DDP will use all samples on all ranks (in your example I guess there’s 2 ranks).

This is why in general you want want to partition your data appropriately across ranks to ensure different model replicas get different data.

2 Likes

Thanks Rohan, this cleared it out for me.