In that case you could get the state_dicts from each model, average all parameters (or use another reduction instead of the mean), and reload the state_dict to a single model.
However, I’m very skeptical if that approach will give you good results.
Given that each model might have converged to another local minimum, I don’t think that e.g. the average of all parameters representing local minima will give you another minimum.
Let us know, how the experiment goes and if you were able to achieve a good performance using this approach.