MAML inner loop parallelization

zzoon91 · November 27, 2019, 7:55am

Hi,
I want to parallelize the inner loop of MAML.
Each inner loop of the MAML will produce individual loss along with individual gradient graphs,
and after the iteration, I have to aggregate the losses followed by backpropagation.

My naive idea is replacing the loop to map.
To do this, I guess I need to aggregate the loss from multiple threads.
(e.g. torch.mean(torch.stack(list_of_loss_from_multiple_threads))

Is it possible to aggregate graphs from worker threads and then do the backprop at once?

Thanks

rvarm1 · December 3, 2019, 12:18am

Hi, it’s pretty tough to give any concrete advice without first knowing what exactly you’re doing and what you’ve tried. Would you mind posting a snippet of code that indicates the inner loop that you’d like to parallelize, and any instructions needed to run the code? Thanks!

Sudarshan_Babu · July 20, 2020, 4:43pm

I actually want to try this out, I am afraid that the gains in parallelizing the inner loop might be outweighed by communication overhead, were you able to see speed ups ?

Sudarshan_Babu · July 20, 2020, 4:45pm

further, you dont have to aggregate the loss from the different threads, you could compute the gradients within each thread and then aggregate the gradients from different threads.

zzoon91 · July 28, 2020, 3:34pm

Sorry, I gave up parallelizing MAML. So I don’t have results that might help your concern.

mrshenli · July 28, 2020, 4:04pm

Yes, this is possible, and this is how DataParallel is implemented. The parallel_apply() in the code below will launch multiple threads with each creating their own autograd graph.

github.com

pytorch/pytorch/blob/2de549518e7f0ce2820650b401cd21a9901c74a9/torch/nn/parallel/data_parallel.py#L147-L162


      
          def forward(self, *inputs, **kwargs):
              if not self.device_ids:
                  return self.module(*inputs, **kwargs)
          
              for t in chain(self.module.parameters(), self.module.buffers()):
                  if t.device != self.src_device_obj:
                      raise RuntimeError("module must have its parameters and buffers "
                                         "on device {} (device_ids[0]) but found one of "
                                         "them on device: {}".format(self.src_device_obj, t.device))
          
              inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
              if len(self.device_ids) == 1:
                  return self.module(*inputs[0], **kwargs[0])
              replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
              outputs = self.parallel_apply(replicas, inputs, kwargs)
              return self.gather(outputs, self.output_device)