I would like to ensemble some segmentation trained models pre-trained with slightly different datasets and create a new segmentation model with those. How can I do that? I saw a few examples with classifier problems, but I did not see any with segmentation.
Do you mean like the same framework (like a DeepLabv3 or PSPNet) pre-trained on different datasets, followed by fine-tuning all of them on a shared dataset? For instance if you pre-trained on Pascal VOC 2012 with one model, and MS COCO Stuff with another, then they have different classes, and hence different network head weight shapes, but if you want to only use the classes (20 or so) from Pascal VOC, then you could do this, testing on either Pascal VOC or MS COCO stuff filtering the categories of the latter to be the desired 20.
Then there’s the question of how you combine the predictions. I guess you could simply weight the logits from each model at inference time, and use an argmax operation computed pixel-wise across classes.
I notice a lot of the contest winners in semantic segmentation use ensemble models, so if you wanna check out the technical reports of the leaderboards for MS COCO Stuff or Pascal VOC 2012 or Cityscapes, I bet you could find some generic descriptions of how they do it. Often they train different models for different image scales, and then at test time resort to multi-scale predictions by averaging the interpolated logits to the desired output size, i.e. model A predicts a mask for an image of size 600 times 600, model B for an image of size 800 times 800, etc… then you need a final prediction of size 500 times 500 which you can get by bilinear/bicubic interpolation to that final size, and average the prediction values at each pixel.
Thank you for your reply,
Actually, I am dealing with the UNet model for multiclass segmentation task. The datasets are quite the same, same size and pixels, only the grid change a little bit. I have 25 types of ground-truth/mask and 4 different shapes of tumour that I can add to it. But I can not repeat the ground-truth types. I was thinking to use the ensemble approach because my dataset is very small (n=25). I saw this paper [1711.01468] Ensembles of Multiple Models and Architectures for Robust Brain Tumour Segmentation discussing it, but I did not fully understand.
Medical images, breast phantoms.