I want to get optimizer tensors for each parameter in my network.
I can do something like this to extract the optimizer statistics
parameters = [prm for prm in optimizer.param_groups['params']] param_ = parameters state = optimizer.state statistic1 = state[param_]['exp_avg'] statistic2 = state[param_]['exp_avg_sq']
Now which parameter does p actually correspond to? I have a transformer network where many of the encoder layers have the same shape, so I won’t be able to rely on shape information alone.
I know that toch.optim and torch.module are independent so this might by tricky.
If I look at the optimizer state dict
is just a list of numbers [0, 1, 2, …]
For example, how would I get the