I have a dataset class which reads samples.
(domain, image, label)
In each batch, I want to enforce that every single domain is represented. For example, 8 domains would warrant a batch size of
I have written a custom shuffle function which sorts the entire pool of images and rearranges the index such that each batch will have
K samples from each domain.
This sorting occurs as
train_loader.dataset.sort('my_custom_shuffler') at every reset point (like an epoch or some other logic). To ensure that train_loader does not jumble my list again,
train_loader.shuffle has been set of False.
In my application, DistributedSampler(shuffle=False) has been defined and assigned to
train_loader before the train looping starts. Will my manual sorting affect DistributedSampler (DS)? Does DS assign samples to each rank in the very beginning and maintain that order? If yes, then manual shuffling should not affect DS operation.