Adaptive learning rate

Jiaming_Liu · April 25, 2017, 10:07pm

I am also porting ReduceLROnPlateau. This should be pretty light and straight forward. https://github.com/Jiaming-Liu/pytorch-lr-scheduler

educob · June 11, 2017, 6:20pm

I know that it is “Better to Remain Silent and Be Thought a Fool than to Speak and Remove All Doubt” but I still don’t know how to save the optimizer internals along with the method so everything resumes as if nothing had happened.

I would appreciate an answer with code.

Thanks.

chsasank · June 11, 2017, 6:43pm

A PR has been merged to implement LR schedulers here. This is not part of a release yet. For now, you can use LR scheduling like this:

def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
    """Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
    lr = init_lr * (0.1**(epoch // lr_decay_epoch))

    if epoch % lr_decay_epoch == 0:
        print('LR is set to {}'.format(lr))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

    return optimizer

AndreiCostinescu · June 11, 2017, 8:14pm

I think that the return statement should be return optimizer instead of return optimiser

chsasank · June 11, 2017, 8:27pm

Yep! Edited it! Damn autocorrect.

AndreiCostinescu · June 11, 2017, 8:33pm

Nice!
Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:

def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
    """Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
    if epoch % lr_decay_epoch:
        return optimizer
    
    for param_group in optimizer.param_groups:
        param_group['lr'] *= lr_decay
    return optimizer

You don’t need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate

educob · June 12, 2017, 11:11am

Hi.

Thanks for the answers.
But how this function would be used and what would be saved to disk besides the model.

Thanks.

chsasank · June 12, 2017, 11:20am

For an example on LR scheduling and saving the model, you can refer to imagenet example or transfer learning tutorial.

educob · June 13, 2017, 7:29am

Thanks.

I will study them.

educob · June 14, 2017, 7:02pm

So the answer seems to save the network and the optimizer.
I will try it.

Thanks.

deepcode · June 19, 2017, 7:14pm

I would love to see these functions get added to master, probably doesnt make sense for everyone to keep rewriting this on their own.

chsasank · June 19, 2017, 7:58pm

They are. See PR #1370.

Royi · November 19, 2017, 3:43pm

Any update on this?
What would be the correct way to do so in the latest version (0.2)?

Thank You.

Jules_Gagnon-Marchan · January 11, 2018, 7:05pm

With this
http://pytorch.org/docs/master/_modules/torch/optim/lr_scheduler.html#ExponentialLR

Rafael_Valle · January 18, 2018, 7:38pm

@apaszke why reconstructing the optimizer instead of updating optimizer.param_groups ?
Reconstructing the optimizer, i.e. optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate), can create a spike in the loss function…

Moreo · February 22, 2018, 5:22pm

What about using the load_state_dict on a modified version of it?

def adjust_learning_rate(optimizer, iter, each):
    # sets the learning rate to the initial LR decayed by 0.1 every 'each' iterations
    lr = args.lr * (0.1 ** (iter // each))
    state_dict = optimizer.state_dict()
    for param_group in state_dict['param_groups']:
        param_group['lr'] = lr
    optimizer.load_state_dict(state_dict)
    return lr

shubhvachher · June 26, 2019, 11:18am

apologies for bumping this old thread. Putting this here for future reference. The way to do this would be using the lr_scheduler (torch.optim — PyTorch 2.1 documentation)

Along with that, it seems, options like this are available.

So the learning rate is stored in optim.param_groups[i]['lr'] . optim.param_groups is a list of the different weight groups which can have different learning rates. Thus, simply doing:

for g in optim.param_groups:
    g['lr'] = 0.001

G_B · February 23, 2020, 7:26pm

Hey, how to do this with libtorch in C++? I have no param_groups in the optimizer