Merging two or more models together

I’m not sure, what “merge” means in this case and if you want to reduce the different parameter sets to a single one or if you want to create a model ensemble.
In the former case, you could use this approach, but note this concern. In the latter case you could use this approach.

This would be my concern, but you should run your experiments and see, if it could work for your use case. E.g. Stochastic Weight Averaging also works, but would average “similar” checkpoints, not completely different models, which might have converged to different minima.