Couple of very simple questions. When shuffle=True, does DistributedSampler shuffle over the full dataset or the just the shard of the replica/rank ? What happens when dataset size is not divisible by num_replicas ?
The shuffled indices will be created baden on the length of the entire dataset as seen here.
If drop_last=True
is used the indices will be dropped to make the dataset indices divisible by the number of ranks. Otherwise the indices will be padded with repeated samples as seen here.
@ptrblck , thanks for the precise answer.