Adaptive learning rate

I am also porting ReduceLROnPlateau. This should be pretty light and straight forward.


I know that it is “Better to Remain Silent and Be Thought a Fool than to Speak and Remove All Doubt” but I still don’t know how to save the optimizer internals along with the method so everything resumes as if nothing had happened.

I would appreciate an answer with code.


A PR has been merged to implement LR schedulers here. This is not part of a release yet. For now, you can use LR scheduling like this:

def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
    """Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
    lr = init_lr * (0.1**(epoch // lr_decay_epoch))

    if epoch % lr_decay_epoch == 0:
        print('LR is set to {}'.format(lr))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

    return optimizer

I think that the return statement should be return optimizer instead of return optimiser :slight_smile:

1 Like

Yep! Edited it! Damn autocorrect.

1 Like

Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:

def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
    """Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
    if epoch % lr_decay_epoch:
        return optimizer
    for param_group in optimizer.param_groups:
        param_group['lr'] *= lr_decay
    return optimizer

You don’t need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate :slight_smile:



Thanks for the answers.
But how this function would be used and what would be saved to disk besides the model.


For an example on LR scheduling and saving the model, you can refer to imagenet example or transfer learning tutorial.


I will study them.

So the answer seems to save the network and the optimizer.
I will try it.


I would love to see these functions get added to master, probably doesnt make sense for everyone to keep rewriting this on their own.

1 Like

They are. See PR #1370.


Any update on this?
What would be the correct way to do so in the latest version (0.2)?

Thank You.

With this

@apaszke why reconstructing the optimizer instead of updating optimizer.param_groups ?
Reconstructing the optimizer, i.e. optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate), can create a spike in the loss function…

What about using the load_state_dict on a modified version of it?

def adjust_learning_rate(optimizer, iter, each):
    # sets the learning rate to the initial LR decayed by 0.1 every 'each' iterations
    lr = * (0.1 ** (iter // each))
    state_dict = optimizer.state_dict()
    for param_group in state_dict['param_groups']:
        param_group['lr'] = lr
    return lr

apologies for bumping this old thread. Putting this here for future reference. The way to do this would be using the lr_scheduler (

Along with that, it seems, options like this are available.

So the learning rate is stored in optim.param_groups[i]['lr'] . optim.param_groups is a list of the different weight groups which can have different learning rates. Thus, simply doing:

for g in optim.param_groups:
    g['lr'] = 0.001

Hey, how to do this with libtorch in C++? I have no param_groups in the optimizer :frowning: