The Adam weight update is defined as

w_{t+1} = w_t - \eta \left( \frac{1}{\sqrt{v_t} + \epsilon} \right) m_t

(see here for rendered latex)

I would like to plot the “effective learning rate” of my optimizer during training, which requires knowing the moment estimates v_t and m_t. Does anyone know how I can extract these from an Adam optimizer?

You can grab these values by doing this,

```
def get_betas(optim):
for group in optim.param_groups:
return group['betas']
```

These are just the beta1 and beta2 hyperparameters used to calculate running averages of the gradients, right? I want to know the actual moment estimates, which incorporate these betas along with the gradients.

By reading through the source (torch.optim.adam — PyTorch 1.9.1 documentation), these values are defined as `exp_avg`

and `exp_avg_sq`

, you should be able to access via something like this,

```
for group in optim.param_groups:
for p in group['params']:
state = optim.state[p]
print(state['exp_avg'], state['exp_avg_sq'])
```

1 Like