The Adam weight update is defined as
w_{t+1} = w_t - \eta \left( \frac{1}{\sqrt{v_t} + \epsilon} \right) m_t
(see here for rendered latex)
I would like to plot the “effective learning rate” of my optimizer during training, which requires knowing the moment estimates v_t and m_t. Does anyone know how I can extract these from an Adam optimizer?
You can grab these values by doing this,
def get_betas(optim):
for group in optim.param_groups:
return group['betas']
These are just the beta1 and beta2 hyperparameters used to calculate running averages of the gradients, right? I want to know the actual moment estimates, which incorporate these betas along with the gradients.
By reading through the source (torch.optim.adam — PyTorch 1.9.1 documentation), these values are defined as exp_avg
and exp_avg_sq
, you should be able to access via something like this,
for group in optim.param_groups:
for p in group['params']:
state = optim.state[p]
print(state['exp_avg'], state['exp_avg_sq'])
1 Like