How to extract moment estimates from torch.optim.Adam

blakebullwinkel · October 1, 2021, 8:24pm

The Adam weight update is defined as

w_{t+1} = w_t - \eta \left( \frac{1}{\sqrt{v_t} + \epsilon} \right) m_t

(see here for rendered latex)

I would like to plot the “effective learning rate” of my optimizer during training, which requires knowing the moment estimates v_t and m_t. Does anyone know how I can extract these from an Adam optimizer?

AlphaBetaGamma96 · October 1, 2021, 8:42pm

You can grab these values by doing this,

def get_betas(optim):
  for group in optim.param_groups:
    return group['betas']

blakebullwinkel · October 1, 2021, 9:07pm

These are just the beta1 and beta2 hyperparameters used to calculate running averages of the gradients, right? I want to know the actual moment estimates, which incorporate these betas along with the gradients.

AlphaBetaGamma96 · October 1, 2021, 9:11pm

By reading through the source (torch.optim.adam — PyTorch 1.9.1 documentation), these values are defined as exp_avg and exp_avg_sq, you should be able to access via something like this,

for group in optim.param_groups:
  for p in group['params']:
    state = optim.state[p]
    print(state['exp_avg'], state['exp_avg_sq'])

blakebullwinkel · October 1, 2021, 9:29pm

Thanks, I’ll try this