Emerging 2 different models to create a multi-model

The linked use cases are a bit different as the first one (merging two models) seems to pass features from one model to the other and could train these models in an end2end manner.
The “ensemble” approach uses two pretrained models to combine their output features (often from the penultimate layer) to train a new classifier, which could boost the performance compared to each model standalone. However, if you want to train all models from scratch, take a look at the Stacking section of ensemble learning as the training procedure is using “stages” and differs from the “standard” training routine.
In particular the classifier expects the stage0 models to be trained already and expects to see their predictions while bein trained.
This would mean that an end2end training might yield a worse performance and you would have to create separate training and validation splits to train each stage.