I am working with ensembles of neural networks, roughly of the following form:
class NNEnsemble(): def __init__(self, n_estimators) : super().__init__() self.estimators = nn.ModuleList([ generate_model() for _ in range(self.n_estimators)]) def forward(self, x): out = torch.stack([est(x) for est in self.estimators], dim=1) return torch.sum(out, dim=1)
generate_model() generates a small base model (e.g. MLP 3 layers with 128 hidden neurons). As it turns out, the list-comprehension/for-loop during forward is rather slow. In fact, I only utilize < 10% GPU for an ensemble with 100 networks of small size. Is there any way to speed-up this operation, e.g. by using a “parallel” list comprehension?
I found one post about grouped convolution which mentions a similar structure, but there was no real solution besides waiting for cudnn support.