MAML inner loop parallelization

Hi,
I want to parallelize the inner loop of MAML.
Each inner loop of the MAML will produce individual loss along with individual gradient graphs,
and after the iteration, I have to aggregate the losses followed by backpropagation.

My naive idea is replacing the loop to map.
To do this, I guess I need to aggregate the loss from multiple threads.
(e.g. torch.mean(torch.stack(list_of_loss_from_multiple_threads))

Is it possible to aggregate graphs from worker threads and then do the backprop at once?

Thanks

1 Like

Hi, it’s pretty tough to give any concrete advice without first knowing what exactly you’re doing and what you’ve tried. Would you mind posting a snippet of code that indicates the inner loop that you’d like to parallelize, and any instructions needed to run the code? Thanks!

I actually want to try this out, I am afraid that the gains in parallelizing the inner loop might be outweighed by communication overhead, were you able to see speed ups ?

further, you dont have to aggregate the loss from the different threads, you could compute the gradients within each thread and then aggregate the gradients from different threads.

Sorry, I gave up parallelizing MAML. So I don’t have results that might help your concern. :sweat_smile:

Yes, this is possible, and this is how DataParallel is implemented. The parallel_apply() in the code below will launch multiple threads with each creating their own autograd graph.