I want to get optimizer tensors for each parameter in my network.

I can do something like this to extract the optimizer statistics

```
parameters = [prm for prm in optimizer.param_groups['params']]
param_ = parameters[0]
state = optimizer.state
statistic1 = state[param_]['exp_avg']
statistic2 = state[param_]['exp_avg_sq']
```

Now which parameter does p actually correspond to? I have a transformer network where many of the encoder layers have the same shape, so I won’t be able to rely on shape information alone.

I know that toch.optim and torch.module are independent so this might by tricky.

If I look at the optimizer state dict

```
optimizer_state_dict['param_groups']['params']
```

is just a list of numbers [0, 1, 2, …]

For example, how would I get the `exp_avg`

for `layer1.attention.key.dense.weight`

?