How to average two optimizers

This is related to doing distributed training.

If I have two Adam optimizers (used on the same model arch) how can I create a new optimizer with a state that is the average of the two?

The problem I have is that it seems the way the moving average are stored is in optim.state_dict()['state'], which is a dictionary where the key is a hash of a parameter (and the values what I’m interested in). But the hash depends on the value of the parameter, so given opt_a and opt_b how would I loop over the two in such a way that guarantees I’m getting the right pairings?