Ensemble multiple models together

I am thinking of using two pretrained CNN models and stacking (soft voting for image classification) them together to generate test results.

In test (2 models are pretrained), I let 2 models to generate 2 sets of predictions. Then, I use softmax to pull out the predictive probabilities. According to the predictive probabilities, I let the predictive results with the highest probabilities represent the final output.

I found a tutorial online: How to Train an Ensemble of Convolutional Neural Networks for Image Classification | Medium

Basically, their practice is to ensemble multiple models together and link them via a fully-connected layer. Therefore, only the fully-connected layer is trainable.

However, my method is to allow multiple models to predict by themselves and use the probabilities to decide the final result.

I am not sure which is an appropriate way to do ensemble learning. Any idea?

Using voting on different model predictions sounds valid and you could use scikit-learn for it which provides different voting approaches.
A trainable multi-stage ensemble also sounds valid, but might be more complicated in its training.