Custom Ensemble approach

efficientnet_b0 and efficientnet_b4 models but x takes out_features from linear layer of efficientnet instead of in_features

ptrblck this is amazing… A way to truncate the last layer off the sub-model which works well in my case. The other two ways I also learnt from you, so amazing all round, but I needed this way.

When I turn the sub-model into a nn.Sequential block,

        self.modelA = torch.nn.Sequential(*list(modelA.children())[:-1])

then I lose critical information from the modelA forward pass where it restructures the data between layers…

If I put a hook in,

        self.modelA.FCblock[2].register_forward_hook(get_activation('1FC'))
        x1 = self.modelA(x)     ## x1 is unused, I pass 
        x1FC = activation['1FC']
        ...
        x = torch.cat((x1FC, x2FC), dim=1)

then it the forward pass works but it breaks the gradients in my final ensemble model, and I am ultimately working to recreate the vanilla backpropagation explainability method to visualise the gradients on my input data… so i need the gradients to flow and the hook disrupts my gradients :frowning:

But

    self.modelA = modelA
    self.modelA.fc = nn.Identity()

seems to be working a wonder (for now!), very creative

1 Like

excuse me why did you write

2048+512

Because the used models:

modelA = models.resnet50(pretrained=True)
modelB = models.resnet18(pretrained=True)

output an activation tensor in the shape [batch_size, 2048] and [batch_size, 512], respectively. You can of course just pass the actual sum to the in_features, but I thought seeing these activation shapes separately would clarify how the concatenated feature shape is calculated.

1 Like

Thanks for replying , so any two models i need to remove the classifier layer for both then concatenate them then adding the classifier layer . right ?

please does nb_classes denote to the nubmer of classes that exist in my data ?
def __init__(self, modelA, modelB, nb_classes=10):

I wouldn’t say you need to do it this way but it’s certainly one approach.

Yes.

1 Like

Appreciate your answer. What are the alternatives approach that i may search on ?

Excuse me . Does that code do ensemble learning or removing the last layer from each model then concatenate features then adding a classifier not ensemble

This approach should come close to a stacking ensemble.

1 Like

excuse me does this right ? I’m confused because when I read it I though the code above used concatenate features not ensemble
Ensemble learning typically involves training multiple models independently and combining their predictions, whereas feature concatenation involves combining the extracted features from different models into a single feature vector before passing it through a classifier.

Use whatever works for you. My assumption is that predictions should be passed to a voting classifier, while features might be passed to a trainable classifier to further improve the performance.
However, your use case might differ, so use what works for you.

hi, i use the same code you posted before but i have error
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x9 and 12288x3)

code of MyEnsemble

class MyEnsemble(nn.Module):
    def __init__(self, model_1, model_2, model_3,nb_classes):
        super(MyEnsemble,self).__init__()
        self.model_1   = model_1 
        self.model_2   = model_2 
        self.model_3   = model_3  

        
        #Now Remove the Last layer
        self.model_1.classifier  =  nn.Identity()

        self.model_2.classifier  =  nn.Identity()

        self.model_3.classifier  =  nn.Identity() 
 
        self.classifier = nn.Linear(4096+4096+4096, nb_classes)

    
    
    def forward(self, x):
        x1 = self.model_1(x.clone())  # clone to make sure x is not changed by inplace methods
        x1 = x1.view(x1.size(0), -1)
        
        x2 = self.model_2(x)
        x2 = x2.view(x2.size(0), -1)
         
        x3 = self.model_3(x)
        x3 = x3.view(x3.size(0), -1)
        
        #final
        x = torch.cat((x1, x2, x3), dim=1)
        x = F.relu(x.view(x.size(0), -1))
        x = self.classifier(x)
        
        return x

i using the same model vgg-19 with fine tuning and pretraining in different data
the code of vgg-19 model

import torchvision.models as models

class Custom_old_vgg19(nn.Module):
    def __init__(self, num_classes):
        super(Custom_old_vgg19, self).__init__()

        self.vgg19 = models.vgg19(pretrained=True)  # Use vgg19

        for param in self.vgg19.classifier.parameters():
          param.requires_grad = False

        self.vgg19.classifier = nn.Sequential(*[self.vgg19.classifier[i] for i in range(4)])
        self.vgg19.classifier = nn.Sequential(
        nn.Linear(25088, 4096),
        nn.ReLU(True),
        nn.Dropout(0.5),
        nn.Linear(4096, 3))

    def forward(self, x):
        return self.vgg19(x)

summary of model

Custom_old_vgg19(
  (vgg19): VGG(
    (features): Sequential(
      (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): ReLU(inplace=True)
      (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (3): ReLU(inplace=True)
      (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (6): ReLU(inplace=True)
      (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (8): ReLU(inplace=True)
      (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): ReLU(inplace=True)
      (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (13): ReLU(inplace=True)
      (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (15): ReLU(inplace=True)
      (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (17): ReLU(inplace=True)
      (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (20): ReLU(inplace=True)
      (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (22): ReLU(inplace=True)
      (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (24): ReLU(inplace=True)
      (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (26): ReLU(inplace=True)
      (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
      (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (29): ReLU(inplace=True)
      (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (31): ReLU(inplace=True)
      (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (33): ReLU(inplace=True)
      (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (35): ReLU(inplace=True)
      (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    )
    (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
    (classifier): Sequential(
      (0): Linear(in_features=25088, out_features=4096, bias=True)
      (1): ReLU(inplace=True)
      (2): Dropout(p=0.5, inplace=False)
      (3): Linear(in_features=4096, out_features=3, bias=True)
    )
  )

summary of Ensemble model

MyEnsemble(
 (model_1): Custom_old_vgg19(
   (vgg19): VGG(
     (features): Sequential(
       (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (1): ReLU(inplace=True)
       (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (3): ReLU(inplace=True)
       (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (6): ReLU(inplace=True)
       (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (8): ReLU(inplace=True)
       (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (11): ReLU(inplace=True)
       (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (13): ReLU(inplace=True)
       (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (15): ReLU(inplace=True)
       (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (17): ReLU(inplace=True)
       (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (20): ReLU(inplace=True)
       (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (22): ReLU(inplace=True)
       (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (24): ReLU(inplace=True)
       (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (26): ReLU(inplace=True)
       (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (29): ReLU(inplace=True)
       (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (31): ReLU(inplace=True)
       (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (33): ReLU(inplace=True)
       (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (35): ReLU(inplace=True)
       (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
     )
     (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
     (classifier): Sequential(
       (0): Linear(in_features=25088, out_features=4096, bias=True)
       (1): ReLU(inplace=True)
       (2): Dropout(p=0.5, inplace=False)
       (3): Linear(in_features=4096, out_features=3, bias=True)
     )
   )
   (classifier): Identity()
 )
 (model_2): Custom_old_vgg19(
   (vgg19): VGG(
     (features): Sequential(
       (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (1): ReLU(inplace=True)
       (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (3): ReLU(inplace=True)
       (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (6): ReLU(inplace=True)
       (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (8): ReLU(inplace=True)
       (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (11): ReLU(inplace=True)
       (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (13): ReLU(inplace=True)
       (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (15): ReLU(inplace=True)
       (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (17): ReLU(inplace=True)
       (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (20): ReLU(inplace=True)
       (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (22): ReLU(inplace=True)
       (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (24): ReLU(inplace=True)
       (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (26): ReLU(inplace=True)
       (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (29): ReLU(inplace=True)
       (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (31): ReLU(inplace=True)
       (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (33): ReLU(inplace=True)
       (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (35): ReLU(inplace=True)
       (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
     )
     (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
     (classifier): Sequential(
       (0): Linear(in_features=25088, out_features=4096, bias=True)
       (1): ReLU(inplace=True)
       (2): Dropout(p=0.5, inplace=False)
       (3): Linear(in_features=4096, out_features=3, bias=True)
     )
   )
   (classifier): Identity()
 )
 (model_3): Custom_old_vgg19(
   (vgg19): VGG(
     (features): Sequential(
       (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (1): ReLU(inplace=True)
       (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (3): ReLU(inplace=True)
       (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (6): ReLU(inplace=True)
       (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (8): ReLU(inplace=True)
       (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (11): ReLU(inplace=True)
       (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (13): ReLU(inplace=True)
       (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (15): ReLU(inplace=True)
       (16): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (17): ReLU(inplace=True)
       (18): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (19): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (20): ReLU(inplace=True)
       (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (22): ReLU(inplace=True)
       (23): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (24): ReLU(inplace=True)
       (25): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (26): ReLU(inplace=True)
       (27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
       (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (29): ReLU(inplace=True)
       (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (31): ReLU(inplace=True)
       (32): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (33): ReLU(inplace=True)
       (34): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
       (35): ReLU(inplace=True)
       (36): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
     )
     (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
     (classifier): Sequential(
       (0): Linear(in_features=25088, out_features=4096, bias=True)
       (1): ReLU(inplace=True)
       (2): Dropout(p=0.5, inplace=False)
       (3): Linear(in_features=4096, out_features=3, bias=True)
     )
   )
   (classifier): Identity()
 )
 (classifier): Linear(in_features=12288, out_features=3, bias=True)
)

thank you for your time

Custom_old_vgg19 returns an output with 3 features:

...
nn.Linear(4096, 3))

which are then concatenated to an activation tensor with 3*3=9 features:

x = torch.cat((x1, x2, x3), dim=1)

self.classifier = nn.Linear(4096+4096+4096, nb_classes) expects 3*4096 features and raises the shape mismatch. Fix it via in_features=9 in MyEnsemble.

1 Like

thank you the code is worked !!
this method of ensemble is a bagging approach ?