Dataloader shared list across workers with ddp

JonasFrey96 · January 25, 2021, 4:05pm

Hello,
I have a question regarding dataloaders and implementing costom datasets in PyTorch:
What is the best way to share data between mutiple dataloader workers.
I would like to sometimes instead of returning the index that is provided in def getitem(self, index) return a different index with a certain probability. The goal is to implement an efficient replay buffer fully in the dataloader.
I would like to have a way to set a list of possible indexes to rehearsel in form of a list in each dataloader. Also each dataloader should be able to append indexe to the list.

def __getitem__(self, index):
   flag = True
   if random.random() > 0.5:
       # rehearsel element
       index = random.choice( self._shared_list_between_all_workers-ddp )
   elif random.random() > 0.05:
       if not index in self._shared_list_between_all_workers-ddp:
           # add to the list with a 5% chance if not rehearseld
           self._shared_list_between_all_workers-ddp.append(index)

   ### do stuff

Best Jonas

JonasFrey96 · January 28, 2021, 10:15am

Okay I looked a little bit into it and using simple multiprocessing arrays does the job.
https://docs.python.org/3/library/multiprocessing.html

Pietro_Cicalese · July 5, 2022, 11:04pm

This was done in DP. See here: