Custom Ensemble approach

efficientnet_b0 and efficientnet_b4 models but x takes out_features from linear layer of efficientnet instead of in_features

ptrblck this is amazing… A way to truncate the last layer off the sub-model which works well in my case. The other two ways I also learnt from you, so amazing all round, but I needed this way.

When I turn the sub-model into a nn.Sequential block,

        self.modelA = torch.nn.Sequential(*list(modelA.children())[:-1])

then I lose critical information from the modelA forward pass where it restructures the data between layers…

If I put a hook in,

        x1 = self.modelA(x)     ## x1 is unused, I pass 
        x1FC = activation['1FC']
        x =, x2FC), dim=1)

then it the forward pass works but it breaks the gradients in my final ensemble model, and I am ultimately working to recreate the vanilla backpropagation explainability method to visualise the gradients on my input data… so i need the gradients to flow and the hook disrupts my gradients :frowning:


    self.modelA = modelA
    self.modelA.fc = nn.Identity()

seems to be working a wonder (for now!), very creative

1 Like

excuse me why did you write


Because the used models:

modelA = models.resnet50(pretrained=True)
modelB = models.resnet18(pretrained=True)

output an activation tensor in the shape [batch_size, 2048] and [batch_size, 512], respectively. You can of course just pass the actual sum to the in_features, but I thought seeing these activation shapes separately would clarify how the concatenated feature shape is calculated.

1 Like

Thanks for replying , so any two models i need to remove the classifier layer for both then concatenate them then adding the classifier layer . right ?

please does nb_classes denote to the nubmer of classes that exist in my data ?
def __init__(self, modelA, modelB, nb_classes=10):

I wouldn’t say you need to do it this way but it’s certainly one approach.


1 Like

Appreciate your answer. What are the alternatives approach that i may search on ?

Excuse me . Does that code do ensemble learning or removing the last layer from each model then concatenate features then adding a classifier not ensemble

This approach should come close to a stacking ensemble.

1 Like

excuse me does this right ? I’m confused because when I read it I though the code above used concatenate features not ensemble
Ensemble learning typically involves training multiple models independently and combining their predictions, whereas feature concatenation involves combining the extracted features from different models into a single feature vector before passing it through a classifier.

Use whatever works for you. My assumption is that predictions should be passed to a voting classifier, while features might be passed to a trainable classifier to further improve the performance.
However, your use case might differ, so use what works for you.