Hey,
I am trying to monitor the optimizer steps that means $\Delta W$ after it came out of the optimizer and before it is applied to the weights.
I started with
for param_group in optimizer.params_groups:
for param in param_group["params"]:
state = optimizer.state[param]
For an Adam optimizer, each state has the keys max_exp_avg_sq,exp_avg_sq,exp_avg.
What do they mean and which of them is used as $\Delta W$?