Adaptive learning rate

It’s likely that the lr was lowered by the optimizer and the new one doesn’t know about it, so it applies too large updates for a few iterations. You can always safely modify the parameter_groups dict, but I guess we’ll need to figure out a better way.

Yeah, a cleaner method would be appreciated. I’m using @trypag’s method now, but it seems brittle to modify the internals of the optimizer and redundant for everyone to have to write this themselves.

1 Like

How about more general situations where it is desired to adaptively adjust any parameters defined in the model? Is it practical to reconstruct optim in every loop? Maybe a mechanism like placeholder?

Thanks!

@lliu25 in that case, define a python function that changes the parameter_groups as you wish (after every iteration or epoch). Placeholders are unnecessary.

3 Likes

@smth True. That does put placeholder extraneous.

hey not to spam but I implemented these two callbacks (and more) which are available at this repository, and can be used in any kind of training loop. See this thread for further discussion on it

2 Likes

I am also porting ReduceLROnPlateau. This should be pretty light and straight forward. https://github.com/Jiaming-Liu/pytorch-lr-scheduler

11 Likes

I know that it is “Better to Remain Silent and Be Thought a Fool than to Speak and Remove All Doubt” but I still don’t know how to save the optimizer internals along with the method so everything resumes as if nothing had happened.

I would appreciate an answer with code.

Thanks.

A PR has been merged to implement LR schedulers here. This is not part of a release yet. For now, you can use LR scheduling like this:

def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
    """Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
    lr = init_lr * (0.1**(epoch // lr_decay_epoch))

    if epoch % lr_decay_epoch == 0:
        print('LR is set to {}'.format(lr))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

    return optimizer
10 Likes

I think that the return statement should be return optimizer instead of return optimiser :slight_smile:

1 Like

Yep! Edited it! Damn autocorrect.

1 Like

Nice!
Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:

def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
    """Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
    if epoch % lr_decay_epoch:
        return optimizer
    
    for param_group in optimizer.param_groups:
        param_group['lr'] *= lr_decay
    return optimizer

You don’t need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate :slight_smile:

9 Likes

Hi.

Thanks for the answers.
But how this function would be used and what would be saved to disk besides the model.

Thanks.

For an example on LR scheduling and saving the model, you can refer to imagenet example or transfer learning tutorial.

Thanks.

I will study them.

So the answer seems to save the network and the optimizer.
I will try it.

Thanks.

I would love to see these functions get added to master, probably doesnt make sense for everyone to keep rewriting this on their own.

1 Like

They are. See PR #1370.

3 Likes

Any update on this?
What would be the correct way to do so in the latest version (0.2)?

Thank You.

With this
http://pytorch.org/docs/master/_modules/torch/optim/lr_scheduler.html#ExponentialLR