Training models together? How?

Usually the model are trained independently, and their penultimate activation is used in a model ensemble to create the final classifier.
Here is a small exampe of a model ensemble.
Based on the description, some blending was used, so maybe the class logits/probabilities were reduced instead of using a final classifier.