Ensemble loss not going down proportionally


I have implemented an ensemble consisting of 3-layer MLPs with the following architecture:

super(MLP, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.relu1 = torch.nn.ReLU()
        self.batch1 = torch.nn.BatchNorm1d(H)
        self.hidden1 = torch.nn.Linear(H, H)
        self.relu2 = torch.nn.ReLU()
        self.batch2 = torch.nn.BatchNorm1d(H)
        self.hidden2 = torch.nn.Linear(H, H)
        self.hidden3 = torch.nn.Linear(H, D_out)
        self.logSoftMax = torch.nn.LogSoftmax(dim=1)
        self.SoftMax = torch.nn.Softmax(dim=1)

When testing the loss for the ensemble I don’t see the loss decreasing when the number of models is increased. For a single model, it starts out with a reasonable value and then as the ensemble increases it goes up and then down. I think it might be something wrong with how we add the predictions. This is how we are doing it:

def avg_evaluate(models):
    loss_fn = torch.nn.NLLLoss()
    loss = 0
    total = 0
    for batch_idx, (data, target) in enumerate(test_loader):
        y_preds = []
        for idx, model in enumerate(models):
            model = model.eval()
            data = data.view(data.shape[0], -1)

            y_pred, _ = model(data)

        loss = loss + loss_fn(torch.div(torch.stack(y_preds, dim=0).sum(dim=0), len(models)), target).item()

    print("Final loss")
    print(loss / len(test_loader))
    loss = loss / len(test_loader)

Does anyone have any idea of what could be wrong? Would appreciate any help! Thanks.