Monitor optimizer step - Adam

Johannes_Vogt · April 29, 2025, 10:34am

Hey,

I am trying to monitor the optimizer steps that means $\Delta W$ after it came out of the optimizer and before it is applied to the weights.

I started with

for param_group in optimizer.params_groups:
    for param in param_group["params"]:
        state = optimizer.state[param]

For an Adam optimizer, each state has the keys max_exp_avg_sq,exp_avg_sq,exp_avg.

What do they mean and which of them is used as $\Delta W$?

qq-me · April 30, 2025, 10:10am

The states track first momentum, second momentum and maximal values of second momentum, because they are used in the update rule. Most optimizers don’t explicitly store delta W.

You can however clone and store parameters before optimizer step, perform a step, and take the difference between parameters before and after step.