How to handle non-determinism in DistributedSampler?

As per my requirement the batch composition is non-deterministic which is easily handled through a custom Batchsampler on a single-GPU or DataParallel implementation. For a DistributedDataParallel scenario, I can’t think of anyway to fix the issue.

I wish to seek discussion regarding the potential solution I have in mind for managing it for DDP.
Potential Solution
Since only 1 sequence of indices should be generated is it wise to restrict sampling to rank=0 and use torch.distributed.broadcast to collect samples produced by rank=0 onto corresponding ranks?
Something like

import torch
from torch.utils.data.sampler import SequentialSampler
#Simulate different ordering of indices with a non-deterministic batch sampler
samples = torch.randperm(12).tolist()
print('Starting samples for rank:', args.rank, samples)

###Sync'd solution
s = torch.cuda.Stream()
samples = torch.LongTensor(samples).to(args.rank)
handle=torch.distributed.broadcast(samples, src=0, async_op=True)
handle.wait()
with torch.cuda.stream(s):
    s.wait_stream(torch.cuda.default_stream())
print('Samples in rank:',args.rank, samples)

==output==
Starting samples for rank: 1 [6, 0, 8, 11, 3, 4, 7, 5, 2, 9, 10, 1]
Starting samples for rank: 0 [3, 5, 6, 9, 7, 11, 10, 1, 8, 0, 2, 4]
Samples in rank: 1 tensor([ 3,  5,  6,  9,  7, 11, 10,  1,  8,  0,  2,  4], device='cuda:1')
Samples in rank: 0 tensor([ 3,  5,  6,  9,  7, 11, 10,  1,  8,  0,  2,  4], device='cuda:0')

I can effectively move this inside CustomDistributedSampler to broadcast rank:0’s indices across all devices and use a slice similar to PyTorch’s distributed sampler to return indices.

Any issues with the above implementation? Is there a better looking and/or optimized solution?

@ptrblck Any suggestions

I don’t fully understand your use case since you are broadcasting the same indices.
Is this intended, i.e. do you want to load the same samples on each rank?
Note that the DistributedSampler chunks the indices of the Dataset so that sample duplication would be avoided.

Sorry for the confusion.
I intend to send same value of indices to all ranks. I am using a custom sampler for which even seeding everything leads to non-determinism in indices generation. This makes using the default DistributedSampler useless. At the moment, instead of finding and fixing the non-determinism I am trying to broadcast the indices so that all DistributedSampler instances sample from identical indices list.

OK, I see and your approach seems to work for this. However, note that I’ve described that using the same indices in each rank would not be the standard approach. Usually you would use a chunk of the indices on each rank so that they can train on a separate subset.
If your dataset is the same on each rank, the model’s state as well, and you are using the same indices, you would just repeat the same operations on each rank.
Since I’m not familiar with your use case, this might be indeed what you want, so see this as a side note.

Yup, noted. Thanks for the confirmation.