I am trying to train an ensemble of models on a dataset. At the moment I am using a wrapper, which wraps all my models of the ensemble into a Module:
class Ensemble(nn.Module): def __init__(self, models): super().__init__() self.models = models def forward(self, x): y =  for model in self.models: y.append(model(x)) return y
This works well, but I think it is not very efficient, because the for-loop does not parallelize well and I still need a lot of memory for the backward-pass, since all results are collected in a single loss function. What would be the best way, to make the training of this ensemble more efficient?
I was thinking of separating the models during training and putting them on different GPUs, but I do not enough GPUs to put every model on a single GPU…
Is it somehow possible to loop over the dataset and over the models in order to efficiently distribute the training over the GPUs? Is there maybe a totally different approach?
Thanks for helping!