How to store embeddings from different ranks in DistributedDataParallel mode?

RocketFlash · August 28, 2020, 11:32am

I want to run my model on dataset and store all embeddings using DistributedDataParallel. I created dataloader with DistributedSampler and now want to store all embeddings in the form:
(image_name, embedding)

And after that I want to save them as csv or pickle file.

Will it be correct to create a global list and store data there or will there be conflicts with writing to the list?

mrshenli · August 28, 2020, 3:19pm

By “global list”, you mean Python global variable? And this will create a global list per process? Who will be writing to the global list? BTW, any reason for not using nn.Embedding?

RocketFlash · August 28, 2020, 3:42pm

Yes, by “global list” I mean global python variable. I am using mp.spawn to start distributed training, so I thought that the variables inside the executable file in this case are visible to all ranks. But after executing the code, nothing was written into the dict. What are the benefits of using nn.Embedding? I want to store image_name and embeddings.

mrshenli · August 28, 2020, 3:53pm

Right, global vars are per-process, so each spawned child process will have a different global var.

What are the benefits of using nn.Embedding?

One benefit is that you can then run lookup ops on GPU. And if you need to let the training process to update the embedding as well, using nn.Embedding will also make it easier.

I want to store image_name and embeddings.

If you would like to pass those data back to the main process, one option is to use the multiprocessing SimpleQueue. See the example below.

github.com

pytorch/pytorch/blob/cb26661fe4faf26386703180a9045e6ac6d157df/test/test_multiprocessing.py#L580-L600


      
          def test_event_multiprocess(self):
              event = torch.cuda.Event(enable_timing=False, interprocess=True)
              self.assertTrue(event.query())
          
              ctx = mp.get_context('spawn')
              p2c = ctx.SimpleQueue()
              c2p = ctx.SimpleQueue()
              p = ctx.Process(
                  target=TestMultiprocessing._test_event_multiprocess_child,
                  args=(event, p2c, c2p))
              p.start()
          
              c2p.get()  # wait for until child process is ready
              torch.cuda._sleep(50000000)  # spin for about 50 ms
              event.record()
              p2c.put(0)  # notify child event is recorded
          
              self.assertFalse(event.query())
              c2p.get()  # wait for synchronization in child
              self.assertTrue(event.query())
              p.join()

I am trying to understand this requirement. In your application, is it like each subprocess will produce some image embedding independently and concurrently, and then you wanna save those?

RocketFlash · August 28, 2020, 4:04pm

Yes, each subprocess generate embeddings from dataloader batches. I want to process all my data (generate embeddings) as fast as possible, this is why I want to use DistributedDataParallel. Process and after that save everything in one file.