Training models together? How?

I was browsing on Kaggle when I found the following:

Models and Input sizes

My final submission was a simple average of the following eight models. Inceptions and ResNets usually blend well. If I could have two more weeks I would definitely add some EfficientNets.

2 x inception_resnet_v2, input size 512
2 x inception_v4, input size 512
2 x seresnext50, input size 512
2 x seresnext101, input size 384

Its the first time I hear about training models together, what does it mean? does pytorch support that? any exemples?

How does one average 8 models !?

Usually the model are trained independently, and their penultimate activation is used in a model ensemble to create the final classifier.
Here is a small exampe of a model ensemble.
Based on the description, some blending was used, so maybe the class logits/probabilities were reduced instead of using a final classifier.