Best way to compute the average of multiple models

I think averaging the values in the state_dicts is a valid approach and I’ve also suggested it here in the past.
You should of course check, if this approach is valid at all from the point of view of the trained models.
E.g. if you are trying to calculate the “average model” using completely different training runs, each model could converge to different parameter sets, and I would assume that the average model would yield a bad performance.