Hi @ptrblck , i followed your approach to create an ensemble model but the model is predicting only one output label(0). I have 2 output labels in my data (0,1). I tried changing the relu function to tanh as well. Could you help me with understanding where i am going wrong?
class Loss(torch.nn.modules.Module):
def __init__(self, Wt1, Wt0):
super(Loss, self).__init__()
self.Wt1 = Wt1
self.Wt0 = Wt0
def forward(self, inputs, targets, phase):
loss = - (self.Wt1[phase] * targets * inputs.log() + self.Wt0[phase] * (1 - targets) * (1 - inputs).log())
return loss
class MyEnsemble(nn.Module):
def __init__(self, modelA, modelB, nb_classes=2):
super(MyEnsemble, self).__init__()
self.modelA = modelA
self.modelB = modelB
# Remove last linear layer
self.modelA.fc = nn.Identity()
self.modelB.fc = nn.Identity()
# Create new classifier
self.classifier = nn.Linear(2048+1664, nb_classes)
def forward(self, x):
x1 = self.modelA(x.clone()) # clone to make sure x is not changed by inplace methods
x1 = x1.view(x1.size(0), -1)
x2 = self.modelB(x)
x2 = x2.view(x2.size(0), -1)
x = torch.cat((x1, x2), dim=1)
x = self.classifier(F.tanh(x))
return x
# Train your separate models
# We use pretrained torchvision models here
modelA = resnet50(pretrained=True)
modelB = densenet169(pretrained=True)
# Freeze these models
for param in modelA.parameters():
param.requires_grad_(False)
for param in modelB.parameters():
param.requires_grad_(False)
# Create ensemble model
model = MyEnsemble(modelA, modelB)
model = model.cuda()
criterion = Loss(Wt1, Wt0)
optimizer = torch.optim.SGD(list(modelA.parameters()) + list(modelB.parameters()), lr=0.00001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', patience=1, verbose=True)
##### Train model
model = train_model(model, criterion, optimizer, dataloaders, scheduler, dataset_sizes, num_epochs=5)
Based on your code you are using two pretrained models (modelA and modelB), freeze their parameters, and use a new linear layer as the classifier.
This workflow is correct so far, but then you are only passing the frozen parameters to the optimizer and not the new (randomly initialized) classifier:
excuse me what is the name of this method … I mean I read that there are types of ensemble or methods to do that . what is the title of this methodology and can I ensemble three models in this way ?
Thanks in advance
I think you should find more information and use cases when looking for “model ensemble” or “stacked classifiers”. sklearn gives an overview here and also provides the StackedClassifier class.
Yes, you can stack many models into an ensemble and could also use multiple levels of stacking.
Ensemble methods are(were) often used in Kaggle competitions and I think you might find a lot of resources for this topic there. E.g you could browse through the winning solutions here and check which model(s) were used.
hello @ptrblck I have used this code snippet to ensemble two Models resnet18 and vgg16 however the performance was terrible, can you help me how to ensemble the two models, actually I wanted to aggregate the learned features from the convolutional blocks from both the models and pass it to the classifier(custom defined).
Code:
class MyEnsemble(nn.Module):
def init(self, modelA, modelB, nb_classes=4):
super(MyEnsemble, self).init()
self.modelA = modelA
self.modelB = modelB
@ptrblck I was careless with the optimizer(sgd) params instead of taking the ensembled models parameter I had taken vgg16’s parameter, I sorted out the issue thanks for the prompt though
I don’t know what you’ve changed but the error is usually raised by a linear layer when the expected in_features do not match the input activation features. Check which layer raises the error, check the activation shape, and make sure the feature dimensions match.
ptrblck this is amazing… A way to truncate the last layer off the sub-model which works well in my case. The other two ways I also learnt from you, so amazing all round, but I needed this way.
When I turn the sub-model into a nn.Sequential block,
then I lose critical information from the modelA forward pass where it restructures the data between layers…
If I put a hook in,
self.modelA.FCblock[2].register_forward_hook(get_activation('1FC'))
x1 = self.modelA(x) ## x1 is unused, I pass
x1FC = activation['1FC']
...
x = torch.cat((x1FC, x2FC), dim=1)
then it the forward pass works but it breaks the gradients in my final ensemble model, and I am ultimately working to recreate the vanilla backpropagation explainability method to visualise the gradients on my input data… so i need the gradients to flow and the hook disrupts my gradients
But
self.modelA = modelA
self.modelA.fc = nn.Identity()
seems to be working a wonder (for now!), very creative
modelA = models.resnet50(pretrained=True)
modelB = models.resnet18(pretrained=True)
output an activation tensor in the shape [batch_size, 2048] and [batch_size, 512], respectively. You can of course just pass the actual sum to the in_features, but I thought seeing these activation shapes separately would clarify how the concatenated feature shape is calculated.
Thanks for replying , so any two models i need to remove the classifier layer for both then concatenate them then adding the classifier layer . right ?
please does nb_classes denote to the nubmer of classes that exist in my data ? def __init__(self, modelA, modelB, nb_classes=10):