Access the adaptive learning rates for different torch optimizers

Megh_Bhalerao · August 29, 2022, 3:49am

Hello,
Different optimization algorithms such as ADAM, Adagrad, RMSProp adapt their step size according to the gradients, for example adagrad accumulated the gradient L2 norm and the learning rate is scaled according to that, and RMSProp does an exponential moving average with a momentum parameter to accumulate the L2 norm of the gradients according to the torch.optim docs here - torch.optim — PyTorch 1.12 documentation

I was wondering if there is any clean way of accessing these adaptive learning rates rather than me manually calculating it? Is there some instance variable that I can access externally from the optimizer object? If this is not present already, it would be great to have this.

Please do let me know and thanks for your time.
Megh

ptrblck · August 29, 2022, 5:56pm

You can check the state_dict of the optimizer and see the internal tracking stats:

model = nn.Linear(2, 2, bias=True)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)

out = model(torch.randn(1, 2))
out.mean().backward()
optimizer.step()

print(optimizer.state_dict())
# {'state': {0: {'step': tensor(1.), 'exp_avg': tensor([[-0.0394, -0.0382],
#         [-0.0394, -0.0382]]), 'exp_avg_sq': tensor([[0.0002, 0.0001],
#         [0.0002, 0.0001]])}, 1: {'step': tensor(1.), 'exp_avg': tensor([0.0500, 0.0500]), 'exp_avg_sq': tensor([0.0003, 0.0003])}}, 'param_groups': [{'lr': 0.001, 'betas': (0.9, 0.999), 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False, 'maximize': False, 'foreach': None, 'capturable': False, 'params': [0, 1]}]}

Please don’t tag specific users, as it can discourage others to post an answer.

Megh_Bhalerao · August 29, 2022, 6:41pm

Hi, thanks for this and my apologies for tagging specific users. I have edited my OP.