Pick the n best weights

Can I store the weights for each model, then pick the n best, average them and apply to the model?

You could store multiple state_dicts and afterwards load them and create the average state_dict.
A small example for two different state_dicts is given here.