Custom Ensemble approach

Hi @ptrblck , i followed your approach to create an ensemble model but the model is predicting only one output label(0). I have 2 output labels in my data (0,1). I tried changing the relu function to tanh as well. Could you help me with understanding where i am going wrong?

class Loss(torch.nn.modules.Module):
    def __init__(self, Wt1, Wt0):
        super(Loss, self).__init__()
        self.Wt1 = Wt1
        self.Wt0 = Wt0
    
    def forward(self, inputs, targets, phase):
        loss = - (self.Wt1[phase] * targets * inputs.log() + self.Wt0[phase] * (1 - targets) * (1 - inputs).log())
        return loss

class MyEnsemble(nn.Module):
    def __init__(self, modelA, modelB, nb_classes=2):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        # Remove last linear layer
        self.modelA.fc = nn.Identity()
        self.modelB.fc = nn.Identity()
    
        # Create new classifier
        self.classifier = nn.Linear(2048+1664, nb_classes)
    
    def forward(self, x):
        x1 = self.modelA(x.clone())  # clone to make sure x is not changed by inplace methods
        x1 = x1.view(x1.size(0), -1)
        x2 = self.modelB(x)
        x2 = x2.view(x2.size(0), -1)
        x = torch.cat((x1, x2), dim=1)
        x = self.classifier(F.tanh(x))
        return x

# Train your separate models
# We use pretrained torchvision models here
modelA = resnet50(pretrained=True)
modelB = densenet169(pretrained=True)

# Freeze these models
for param in modelA.parameters():
    param.requires_grad_(False)

for param in modelB.parameters():
    param.requires_grad_(False)

# Create ensemble model
model = MyEnsemble(modelA, modelB)
model = model.cuda()

criterion = Loss(Wt1, Wt0)
optimizer = torch.optim.SGD(list(modelA.parameters()) + list(modelB.parameters()),     lr=0.00001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=1, verbose=True)

##### Train model
model = train_model(model, criterion, optimizer, dataloaders, scheduler, dataset_sizes, num_epochs=5)

Based on your code you are using two pretrained models (modelA and modelB), freeze their parameters, and use a new linear layer as the classifier.
This workflow is correct so far, but then you are only passing the frozen parameters to the optimizer and not the new (randomly initialized) classifier:

optimizer = torch.optim.SGD(list(modelA.parameters()) + list(modelB.parameters()),     lr=0.00001)

Try to pass either all parameters or just the params of the classifier to the optimizer:

optimizer = torch.optim.SGD(model.parameters(), lr=1e-5)
# or
optimizer = torch.optim.SGD(model.classifier.parameters(), lr=1e-5)
3 Likes

Hi i am getting error like

RuntimeError: shape ‘[0, -1]’ is invalid for input of size 131072.

Any idea?

The error message points to an invalid view operation, where it seems you’ve specified the size of the fist dimension as 0:

x = torch.randn(10, 10)
x.view(0, -1)
> RuntimeError: shape '[0, -1]' is invalid for input of size 100

excuse me what is the name of this method … I mean I read that there are types of ensemble or methods to do that . what is the title of this methodology and can I ensemble three models in this way ?
Thanks in advance

I think you should find more information and use cases when looking for “model ensemble” or “stacked classifiers”. sklearn gives an overview here and also provides the StackedClassifier class.
Yes, you can stack many models into an ensemble and could also use multiple levels of stacking.
Ensemble methods are(were) often used in Kaggle competitions and I think you might find a lot of resources for this topic there. E.g you could browse through the winning solutions here and check which model(s) were used.

Thanks a lot for replying so the way that i should use to concatenate three models for example is stacked classifier ?
can i use

class MyEnsemble(nn.Module):

for three models ? like that

class MyEnsemble(nn.Module):
    def __init__(self, modelA, modelB, modelC):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        self.modelC= modelC

Yes, it would be one option and reuses the approach I’ve previously posted.

1 Like

thanks a lot, I will try now … can you please see your inbox

hello @ptrblck I have used this code snippet to ensemble two Models resnet18 and vgg16 however the performance was terrible, can you help me how to ensemble the two models, actually I wanted to aggregate the learned features from the convolutional blocks from both the models and pass it to the classifier(custom defined).

Code:
class MyEnsemble(nn.Module):
def init(self, modelA, modelB, nb_classes=4):
super(MyEnsemble, self).init()
self.modelA = modelA
self.modelB = modelB

    self.modelA.classifier = nn.Identity()
    self.modelB.fc = nn.Identity()
    
    self.classifier = nn.Linear(512+25088, nb_classes)
    
def forward(self, x):
    x1 = self.modelA(x.clone()) 
    x1 = x1.view(x1.size(0), -1)
    #print(x1.shape)
    x2 = self.modelB(x)
    x2 = x2.view(x2.size(0), -1)
    #print(x2.shape)
    x = torch.cat((x1, x2), dim=1)
    
    x = self.classifier(F.relu(x))
    return x

modelA = vgg16(pretrained=True)
modelB = resnet18(pretrained=True)

for param in modelA.parameters():
param.requires_grad_(False)

for param in modelB.parameters():
param.requires_grad_(False)

fused_model=MyEnsemble(modelA, modelB)

How were both models performing in isolation on your dataset and how does it compare to the new model?

@ptrblck I was careless with the optimizer(sgd) params instead of taking the ensembled models parameter I had taken vgg16’s parameter, I sorted out the issue thanks for the prompt though

I change the model A and B but I got the error
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2000 and 2560x8)

I don’t know what you’ve changed but the error is usually raised by a linear layer when the expected in_features do not match the input activation features. Check which layer raises the error, check the activation shape, and make sure the feature dimensions match.

1 Like

efficientnet_b0 and efficientnet_b4 models but x takes out_features from linear layer of efficientnet instead of in_features

ptrblck this is amazing… A way to truncate the last layer off the sub-model which works well in my case. The other two ways I also learnt from you, so amazing all round, but I needed this way.

When I turn the sub-model into a nn.Sequential block,

        self.modelA = torch.nn.Sequential(*list(modelA.children())[:-1])

then I lose critical information from the modelA forward pass where it restructures the data between layers…

If I put a hook in,

        self.modelA.FCblock[2].register_forward_hook(get_activation('1FC'))
        x1 = self.modelA(x)     ## x1 is unused, I pass 
        x1FC = activation['1FC']
        ...
        x = torch.cat((x1FC, x2FC), dim=1)

then it the forward pass works but it breaks the gradients in my final ensemble model, and I am ultimately working to recreate the vanilla backpropagation explainability method to visualise the gradients on my input data… so i need the gradients to flow and the hook disrupts my gradients :frowning:

But

    self.modelA = modelA
    self.modelA.fc = nn.Identity()

seems to be working a wonder (for now!), very creative

1 Like

excuse me why did you write

2048+512

Because the used models:

modelA = models.resnet50(pretrained=True)
modelB = models.resnet18(pretrained=True)

output an activation tensor in the shape [batch_size, 2048] and [batch_size, 512], respectively. You can of course just pass the actual sum to the in_features, but I thought seeing these activation shapes separately would clarify how the concatenated feature shape is calculated.

1 Like

Thanks for replying , so any two models i need to remove the classifier layer for both then concatenate them then adding the classifier layer . right ?

please does nb_classes denote to the nubmer of classes that exist in my data ?
def __init__(self, modelA, modelB, nb_classes=10):

I wouldn’t say you need to do it this way but it’s certainly one approach.

Yes.

1 Like