UNet Ensemble Model and how could I construct the last layer as a classifier?

I am thinking to ensemble 2 UNets into one ensemble model.

Inspired by this link Custom Ensemble approach - #4 by ptrblck, I think I might know how to do that. However, for UNet the last layer is a different story.

The output of each UNet are in different format as mentioned in the above link.

Let’s say the input is:

x = torch.randn(20, 1, 32, 32)

UNet1’s output shape is:

x1: torch.Size([5, 3, 32, 32])

UNet2’s output shape is:

x1: torch.Size([5, 3, 32, 32])

The expected output shape of the UNet Ensemble model is:

x1+x2: torch.Size([5, 3, 32, 32])

How may I make that to happen? Any suggestions?

Similar to the posted 1D model you could concatenate the output features of both models (e.g. in the channel dimension) and pass it through a final nn.Conv2d layer mapping the concatenated channels to the number of classes.

Thanks, I was able to use nn.Conv2d(6, 3) as the last layer’s classifier for the 2 UNet models (UNetEnsemble), but the performance (with 2 sets of pretrained weights) was horrible (Dice: 0.05). However, UNet1’s Dice was 0.89, and UNet2’s Dice was 0.88. Did I concatenate the 2 outputs of UNet1 and UNet2 wrong? Do I need to train the last classifier?

class UNetEnsemble(nn.Module):
    def __init__(self, modelA, modelB, nb_classes=3):
        super(UNetEnsemble, self).__init__()
        self.modelA = modelA
        self.modelB = modelB
        # Create new classifier to merge
        # x1: torch.Size([30, 3, 32, 32]) from x2: torch.Size([30, 3, 32, 32])
        # to x1+x2 cat: torch.Size([30, 6, 32, 32])
        # then use nn.Conv2d(6, 3) to reshape output to torch.Size([30, 3, 32, 32])
        self.classifier = nn.Conv2d(6, 3, kernel_size=3, stride=1, padding=1)
    def forward(self, x):
        x1 = self.modelA(x.clone())  # clone to make sure x is not changed by inplace methods
        print("x1:", x1.shape)
        #x1 = x1.view(x1.size(0), -1)
        x2 = self.modelB(x)
        print("x2:", x2.shape)
        #x2 = x2.view(x2.size(0), -1)
        x = torch.cat((x1, x2), dim=1)
        print("x cat:", x.shape)
        x = self.classifier(F.relu(x))
        print("x output:", x.shape)
        return x

Yes, you need to train the newly initialized classifier since it’s currently using random weights.
If you do not want to train a new layer, you might want to consider e.g. a weighted prediction of both models etc.

How many different methods are there for dealing with such a scenario in ensemble learning? The current method I am using is to use Softmax to pull out each pixel’s probability. For example, each pixel has a potential prediction of 0 or 1. If UNet1 predicted 0 (88%) and UNet2 predicted 1 (90%), then that pixel will be predicted as 1 because UNet2 gets a result of 1 with a 90% probability.

  • Combine multiple models, add a new last layer, and retrain the last layer (need to retrain the last layer);

  • Combine multiple models, use the Softmax function to pull out probabilities, and keep the predictions only with the highest probabilities (no need to retrain);

  • Any other methods?

Another question is that in your sample code you have “self.modelA(x.clone())” and you said “clone to make sure x is not changed by inplace methods”. Does it stop training/gradient descent if I choose to train modelA? What if I use copy.deepcopy()? Is copy.deepcopy() == .clone()?

@rasbt explains some ensemble methods in these slides where a few of these might be easily applicable to your use case e.g. via majority or soft voting. You could also take a look at some sklearn modules e.g. VortingClassifer and try to use these.

No, it doesn’t stop the training or backpropagation since clone doesn’t detach the computation graph.

No, you won’t be able to use deepcopy on a non-leaf tensor (i.e. a tensor which has a gradient history and wasn’t directly created by you).