Problem in an ensemble of subnets

Hi, I am a beginner with Pytorch, and the following code confuses me a lot. :cry:

Instead of using two seperate sub-models with one optimizer/scheduler each, I want to use an ensemble of these two sub-models, so that I only need to write mymodel.train()/eval(), mymodel.cuda()/cpu() or, '') rather than two copies of those for two sub-models. Besides, I only need one optimizer/scheduler. Here is the demo code:

class MyModel(nn.Module):
    def __init__(self, subnet1, subnet2):
        super(MyModel, self).__init__()
        self.subnet1 = subnet1
        self.subnet2 = subnet2
    def loss(self, sample):
        # calculating the loss using self.subnet1 and self.subnet2
        return loss_tensor

mymodel = MyModel(net1, net2)
optimizer = torch.optim.Adam(mymodel.parameters(), lr=LR)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=step_size, gamma=gamma)

There is no forward method in MyModel class since subnet1 and subnet2 are nn.Module with a forward method, And I use the loss.backward() and optimizer/scheduler to train my ensemble.

BUT, I found the accuracy is ~8% lower than the seperate two-sub-model version on the same ground (the latter one with two optimizers/schedulers). Is there sth I’ve missed? or just the bad luck for the degradation. :cry:

Furthermore, Is there any available ways to integrate my sub-models? :smile:

I think the point of ensemble models is to train them separately, otherwise you’ll (at least morally) get a single “twice as wide network with little interaction between two halves” model.

Best regards


Thank you very much, Thomas. :smile: