Truly parallel ensembles

Is it possible to concurrently pass data through a set an ensemble of neural networks (the ensemble contains num_models neural networks)? To give an idea of my current training pipeline, I do the following:

  1. Sample a batch of size n from the dataloader
  2. Pass that batch through one of the networks, returning a loss
  3. Add the loss to the total_loss so far
  4. Repeat from 1. with a different network until all the models have been iterated through once, then backprop total_loss to update all the models. Reset total_loss to 0 and start the outer loop once again.

Similarly, there is a use case where I may wish to pass the same batch of data through all neural networks simultaneously.

I imagine an approach where I can sample a single batch which is n * num_models, then passing that through all num_models in the ensemble simultaneously would be much faster, but I’m unsure how to do this. My instinct is to have a wrapper that runs .chunk or .split on the batch, but then I’ll still be running a for loop over the models (imagine they exist in a list for example) and summing their losses, so we’re back to square one. Having said this, will the async in PyTorch actually parallelise this if I write it in a forward method?


1 Like

If you have multiple devices, you can use a for loop and call each model separately with the corresponding input.
Since CUDA operations are executed asynchronously, all devices will be used at the same time.

However, if you are dealing with a single device, note that each CUDA call will be added to a queue and the device will most likely be busy with the single model.

1 Like

Great, thanks for clearing that up. In which case, would it be possible to design a layer class that is in fact multiple layers, then using something like an einsum, ensure that a data point is passed through each layer simultaneously?