Custom Ensemble approach

Hamada_Fathy · April 6, 2020, 3:49pm

please, could you check my model summary:

[VGG(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace)
    (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (9): ReLU(inplace)
    (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (12): ReLU(inplace)
    (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (16): ReLU(inplace)
    (17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (19): ReLU(inplace)
    (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (22): ReLU(inplace)
    (23): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (24): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (25): ReLU(inplace)
    (26): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (27): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (28): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(29): ReLU(inplace)
    (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (31): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (32): ReLU(inplace)
    (33): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (34): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (35): ReLU(inplace)
    (36): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (37): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (38): ReLU(inplace)
    (39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
    (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (42): ReLU(inplace)
    (43): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (44): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (45): ReLU(inplace)
    (46): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (47): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (48): ReLU(inplace)
    (49): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (50): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (51): ReLU(inplace)
    (52): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
), GlobalPool(
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (maxpool): AdaptiveMaxPool2d(output_size=(1, 1))
  (exp_pool): ExpPool()
  (linear_pool): LinearPool()
  (lse_pool): LogSumExpPool()
), Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)), Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)), Conv2d(1
024, 1, kernel_size=(1, 1), stride=(1, 1)), Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)), Conv2d(1024, 1, ker
nel_size=(1, 1), stride=(1, 1)), Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)), Conv2d(1024, 1, kernel_size=(1
, 1), stride=(1, 1)), Conv2d(1024, 1, kernel_size=(1, 1), stride=(1, 1)), BatchNorm2d(1024, eps=1e-05, momentum=0.1
, affine=True, track_running_stats=True), BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_sta
ts=True), BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), BatchNorm2d(1024, eps=
1e-05, momentum=0.1, affine=True, track_running_stats=True), BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True
, track_running_stats=True), BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), Bat
chNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True), BatchNorm2d(1024, eps=1e-05, moment
um=0.1, affine=True, track_running_stats=True), AttentionMap(
  (channel_attention): CAModule(
    (fc1): Linear(in_features=512, out_features=256, bias=True)
    (fc2): Linear(in_features=256, out_features=512, bias=True)
    (relu): ReLU()
    (sigmoid): Sigmoid()
  )
  (spatial_attention): SAModule(
    (conv1): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
    (conv2): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1))
    (conv3): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
  )
  (pyramid_attention): FPAModule(
    (gap_branch): Sequential(
      (0): AdaptiveAvgPool2d(output_size=1)
      (1): Conv2dNormRelu(
        (conv): Sequential(
          (0): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
          (2): ReLU(inplace)
        )
      )
    )
(mid_branch): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1))
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
    (downsample1): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(512, 1, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
    (downsample2): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(1, 1, kernel_size=(5, 5), stride=(2, 2), padding=(2, 2))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
    (downsample3): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(1, 1, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
    (scale1): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(1, 1, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
    (scale2): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(1, 1, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
(scale3): Conv2dNormRelu(
      (conv): Sequential(
        (0): Conv2d(1, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU(inplace)
      )
    )
  )
)]

ptrblck · April 6, 2020, 7:52pm

What is the shape of x before feeding it to self.classifier?

Based on the model summary, it seems you’ve changed the model, since e.g. classifier is not an nn.Sequential container while it was a single linear layer before.

Hamada_Fathy · April 6, 2020, 9:12pm

torch.cat yield a tensor with size of [24,1] where the size of each x1,x2 and x3 is [8,1]

exactly my model has some modification. So, i think i miss something
and thank you so much for your help

ptrblck · April 6, 2020, 9:17pm

This shouldn’t be the case, if you have kept torch.cat((x1, x2, x3), dim=1):

x1 = torch.randn(8, 1)
x2 = torch.randn(8, 1)
x3 = torch.randn(8, 1)
x = torch.cat((x1, x2,x3), dim=1)
print(x.shape)
> torch.Size([8, 3])

Since you modified the code, please make sure to post the new (executable) code so that we can take another look.

Hamada_Fathy · April 6, 2020, 9:25pm

Here is the code:

class MyEnsemble(nn.Module):
    def __init__(self, model_1, model_2, model_3 , nb_classes=8):
        super(MyEnsemble, self).__init__()
        self.model_1 = model_1
        self.model_2 = model_2
        self.model_3 = model_3
        # Remove last linear layer
        self.model_1.classifier  = nn.Identity()
        self.model_2.classifier  = nn.Identity()
        self.model_3.classifier  = nn.Identity()
        self.classifier = nn.Linear(24, 8)
        
    def forward(self, x):
        x1 = self.model_1(x.clone())  # clone to make sure x is not changed by inplace 
        x1= torch.stack(x1[0])
        x2 = self.model_2(x)
        x2= torch.stack(x2[0])
        x3 = self.model_3(x)
        x3=torch.stack(x3[0])
        x = torch.stack((x1, x2,x3), dim=1)
        x=F.relu(x.view(x.size(0), -1))
        x = self.classifier(x)
        return x

ptrblck · April 7, 2020, 1:25am

You’ve changed the torch.cat call to torch.stack, which will output x in the shape [8, 3, 1], if x1, x2, and x3 are in the shape [8, 1].

Wei_Meng · April 23, 2020, 7:31pm

@ptrblck Hi ptrblck!

I am a new leaner in Pytroch. I have two trained models and try to use both of them to predict. I load the models by:

modelA = torch.load('~/CNN/pytorch/ensem_model_1.py')
modelA.eval()

modelB = torch.load(~/CNN/pytorch/ensem_model_2.py')
modelB.eval()

With your ensemble codes:

## Predict with ensemble models
class MyEnsemble(nn.Module):
    def __init__(self, modelA, modelB, nb_classes=251):
        super(MyEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        # Remove last linear layer
        self.modelA.fc = nn.Identity()
        self.modelB.fc = nn.Identity()
        
        # Create new classifier
        self.classifier = nn.Linear(2048+2048, nb_classes)
        
    def forward(self, x):
        x1 = self.modelA(x.clone())  # clone to make sure x is not changed by inplace methods
        x1 = x1.view(x1.size(0), -1)
        x2 = self.modelB(x)
        x2 = x2.view(x2.size(0), -1)
        x = torch.cat((x1, x2), dim=1)
        
        x = self.classifier(F.relu(x1,x2))
        return x

The prediction is procced by:

model = MyEnsemble(modelA, modelB)
model = model.to(device)
print(check_accuracy_part34(loader_val, model))

where,

def check_accuracy_part34(loader, model):
    num_correct = 0
    num_samples = 0
    model.eval()  # set model to evaluation mode
    with torch.no_grad():
        for x, y in loader:
            x = x.to(device=device, dtype=dtype)  # move to device, e.g. GPU
            y = y.to(device=device, dtype=torch.long)
            scores = model(x)
            _, preds = scores.max(1)
            num_correct += (preds == y).sum()
            num_samples += preds.size(0)
        acc = float(num_correct) / num_samples
        print('  Got %d / %d correct (%.2f)' % (num_correct, num_samples, 100 * acc))
        return acc, preds

My problem is the validation accuracy I got from the ensembled model is very very low, about 0.3%. But if I test the performance of my single model only, the val accuracy is about 65%. It feels like the ensembled model does not inherit the trained parameters. I am not sure if what my intuition is correct or not. Any help about this?

Thanks in advance!

ptrblck · April 23, 2020, 7:58pm

Note that you are creating a new linear layer in MyEnsemble, which will be randomly initialized.
This classifier will take the pen-ultimate activations from both base models and output the new predictions.
If you haven’t retrained this layer, the performance is expected to be bad.

monster · July 29, 2020, 12:54pm

Could you please give an example to ensemble two models in Faster RCNN?
A=model1.pth
B=Model2.pth
I need C=model=(A+B) for single input
code example:

RoIHeads(
  (box_roi_pool): MultiScaleRoIAlign()
  (box_head): TwoMLPHead(
    (fc6): Linear(in_features=12544, out_features=1024, bias=True)
    (fc7): Linear(in_features=1024, out_features=1024, bias=True)
  )
  (box_predictor): FastRCNNPredictor(
    (cls_score): Linear(in_features=1024, out_features=50, bias=True)
    (bbox_pred): Linear(in_features=1024, out_features=412, bias=True)
  )
)

ptrblck · July 30, 2020, 3:39am

It depends, which feature or outputs you would like to concatenate and how this ensemble would look like.
Could you add some more information, so that I could see how it might be implemented?

monster · July 30, 2020, 4:02am

Thanks for reply,
I have a modelA which is trained on class ‘a’ and modelB is trained on class ‘b’, both have similar type of data (X) so I need a modelC to predict classes ‘a’ and ‘b’ by giving an input X.

111186 · August 13, 2020, 1:26am

thanks again，recently I am learning the ensemble. ensemble including Bagging，Boosting and Stacking
could you tell me the method belong which kind of method？

ptrblck · August 13, 2020, 3:48am

My code example would probably come close to the Stacking method, since another classifier is trained on top of the feature outputs from the pretrained models.

A quick recap of the mentioned techniques (since you are currently studying them, please correct me, as I haven’t looked into it recently):

Bagging - would involve bootstrapping during the sampling phase and should thus be independent from the model architecture (it would use weak learners, if I’m not mistaken)
Boosting - sequential weak learners, which are trained on the “residuals” of the preceding classifiers
Stacking - staged classifiers, which would use the output of the previous stage to create a new output

111186 · August 13, 2020, 6:27am

thanks a lot. In fact I want to use the pytorch to implement the three methods, especially the Bagging. I see a lot of papers they said use the ensemble can improve the result. I use google to search the relate code in github and kaggle but couldn’t find the similar code(most are use the Keras ) until I use your code it make a better result, but I see the topic “ensemble” your most code are use the concat, for example x = torch.cat((x1, x2), dim=1) .
but some papers said they use the average or voting methods to implement the ensemble. I am not find the related code in Pytorch Forums. Is there a full blown example of how to ensemble two models for example- vgg19 and resnet18 for prediction on the same dataset and use the voting or average method? or could you recommend some code for me to learn. I would really appreciate it.

ptrblck · August 13, 2020, 7:59am

I don’t know, if there are examples, but to calculate the average of multiple model outputs, you could use:

outputA = modelA(data)
outputB = modelB(data)
outputs = (F.softmax(outputA, 1) + F.softmax(outputB, 1)) / 2.

(if you have multiple models, you could of course use a loop, if that’s easier)

To implement the bootstrap sampling, you could use e.g. sklearn.cross_validation.Bootstrap.

neda_vida · November 26, 2020, 4:54pm

Hi dear Rosa,
How can I update my pytorch? Could you please tell me what is its instruction?
thank you

Ajeet_Ojha · December 3, 2020, 4:11am

@ptrblck , Thank you . I have a similar requirement. I have one more doubt, what should I do with loss, should I add the losses from 2 models , if yes where. I am very new to pytorch, request your inputs. also both of my models are image classifiers, I am experimenting with more than one pre-trained models like inception_v3, resnet etc , in my case both of my models will be same ,say inception_v3 and both will have equal number of classess in output (10). and final output will be binary. so the output will identify if the image is a vowel or consonant. the input data has images of vowels and consonants.

ptrblck · December 3, 2020, 5:10am

In my code snippets the output of the submodels is fed to a new classifier and the original classifiers in the submodules are removed.
This would yield a single output and thus a single loss would be calculated.

Ajeet_Ojha · December 3, 2020, 5:16am

@ptrblck , Thanks for response. very helpful. in my case I realized that each classifier will give single output ( example - first one will predict the class for vowel and second - class of consonant). as per my understanding the code snipped you gave will work for this case also. I am going to try it now with my data and will update. wanted to add this and take your advise.

ptrblck · December 3, 2020, 5:27am

Your approach should work fine and you could create two final classifiers in the main model and feed the two outputs from the submodules to them.