Training linear combination of Pytorch models

ptrblck · April 5, 2020, 6:54am

In that case you could get the state_dicts from each model, average all parameters (or use another reduction instead of the mean), and reload the state_dict to a single model.

However, I’m very skeptical if that approach will give you good results.
Given that each model might have converged to another local minimum, I don’t think that e.g. the average of all parameters representing local minima will give you another minimum.

Let us know, how the experiment goes and if you were able to achieve a good performance using this approach.