Accessing optimizer parameters for each layer parameter

I want to get optimizer tensors for each parameter in my network.
I can do something like this to extract the optimizer statistics

parameters = [prm for prm in optimizer.param_groups['params']]
param_ = parameters[0]
state = optimizer.state
statistic1 = state[param_]['exp_avg']
statistic2 = state[param_]['exp_avg_sq']

Now which parameter does p actually correspond to? I have a transformer network where many of the encoder layers have the same shape, so I won’t be able to rely on shape information alone.
I know that toch.optim and torch.module are independent so this might by tricky.

If I look at the optimizer state dict

optimizer_state_dict['param_groups']['params']

is just a list of numbers [0, 1, 2, …]
For example, how would I get the exp_avg for layer1.attention.key.dense.weight?

1 Like