I tried to imitate the distributed sampler (I needed to return python dicts instead of indices). I’m using a single machine, multi-gpu multiworker setup.
I get a list of dicts I want to return and return the sublist drawn with indices == dist.get_rank() modulo dist.get_world_size(). I believe this is what’s done for DistributedSampler. I added some print statements but I only see printouts from dist.get_rank() == 0 but no other. Is that correct? Then am I only using 1/8 of all data with 8 gpus since my dist.get_world_size() is 8.
I also modified distributed.py where DistributedSampler is from the pytorch source code and print self.rank, and I also only see rank 0. When do the other ranks come in or get used?