Adam: how to output 1st and 2nd moments m and v

robinho · July 15, 2022, 4:52am

adam: how to output such intermediate values involved as 1st and 2nd moments m and v during training?

ptrblck · July 15, 2022, 5:52am

You can check the optmizer.param_goups or optimizer.state_dict().

robinho · July 15, 2022, 9:06am

thank you but all I can see are the trained parameters etc, not the moments

ptrblck · July 15, 2022, 4:27pm

I can see the entire state including the running averages:

model = nn.Linear(1, 1)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

out = model(torch.randn(1, 1))
out.backward()
optimizer.step()

print(optimizer.state_dict())
# {'state': {0: {'step': tensor(1.), 'exp_avg': tensor([[-0.0219]]), 'exp_avg_sq': tensor([[4.7941e-05]])}, 1: {'step': tensor(1.), 'exp_avg': tensor([0.1000]), 'exp_avg_sq': tensor([0.0010])}}, 'param_groups': [{'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'maximize': False, 'foreach': None, 'capturable': False, 'params': [0, 1]}]}

robinho · July 16, 2022, 2:53am

thank you I saw the same output, but what is exp_avg and which are the moments?

robinho · August 4, 2022, 3:28am

could anyone help with my last question pls thanks

IneedMrmeeseeks · August 4, 2022, 8:03am

When I read this paper and source code,
exp_avg is the exponential moving average of gradient value and same as 1st moment vector
exp_avg_sq is the exponential moving average of the sqared gradient and same as 2nd moment vector.
betas are exponential decay rates.

Have a nice day.