Adaptive learning rate

That’s the point. Maybe there’s a good way to save current state of optimizer, call the constructor for optimizer with the new value of lr and assign old state values to the new one. In this case, we will have just a copy of old optimizer with lr adjusted.

1 Like

Have you found any way to update learning rate? It is so critical for me to update the lr.

2 Likes

@apaszke @smth

Keras provides two functions which are fairly straightforward to implement, and everyone loves them:

This one reduces LR when gradient is stuck on a plateau for past “X=patience” epochs:

ReduceLROnPlateau(monitor='loss_value', factor=np.sqrt(0.1), cooldown=0, patience=10, min_lr=0.5e-6, verbose=1)

This one stops you from burning up your Amazon AWS $$$ credits if your model is not learning anything after “X=patience” epochs:

EarlyStopping(monitor='accuracy_value', min_delta=0.0001, patience=15, verbose=1)

Would PyTorch be open to adding something like this?

19 Likes

Any update on this thread about learning rate decay?

@apaszke the concern here is definitely about losing the running averages, etc maintained by the optimizer. As you say the cost of constructing it is negligible.

In the models I’m training right now I see an increase in the loss when I construct a new optimizer to decay the learning rate.

1 Like

You can change lr in this way as @trypag mentioned[quote=“trypag, post:4, topic:320”]
for param_group in optimizer.param_groups:
<\t>param_group[‘lr’] = lr
[/quote]
no state loss anymore

2 Likes

Hi,

I have asked @apaszke about this on slack. See this.

chsasank Apr 11, 2017 18:11
Hi @apaszke , can you clear how to change LR. You have given different answers:

Is there any way to decay the learning rate for optimisers? (slack)

check out the imagenet example (This uses param_groups)

Adaptive learning rate

If you want to change the LR we recommend reconstructing the optimizer with new parameters.

apaszke Apr 11, 2017 19:01
both ways are ok. second one is simpler, but will clear momentum buffers + if you use ada* optimizers your model might blow up if your default lr is too large
now I’m leaning towards the first one

I hope this clears things up.

It’s likely that the lr was lowered by the optimizer and the new one doesn’t know about it, so it applies too large updates for a few iterations. You can always safely modify the parameter_groups dict, but I guess we’ll need to figure out a better way.

Yeah, a cleaner method would be appreciated. I’m using @trypag’s method now, but it seems brittle to modify the internals of the optimizer and redundant for everyone to have to write this themselves.

1 Like

How about more general situations where it is desired to adaptively adjust any parameters defined in the model? Is it practical to reconstruct optim in every loop? Maybe a mechanism like placeholder?

Thanks!

@lliu25 in that case, define a python function that changes the parameter_groups as you wish (after every iteration or epoch). Placeholders are unnecessary.

3 Likes

@smth True. That does put placeholder extraneous.

hey not to spam but I implemented these two callbacks (and more) which are available at this repository, and can be used in any kind of training loop. See this thread for further discussion on it

2 Likes

I am also porting ReduceLROnPlateau. This should be pretty light and straight forward. https://github.com/Jiaming-Liu/pytorch-lr-scheduler

11 Likes

I know that it is “Better to Remain Silent and Be Thought a Fool than to Speak and Remove All Doubt” but I still don’t know how to save the optimizer internals along with the method so everything resumes as if nothing had happened.

I would appreciate an answer with code.

Thanks.

A PR has been merged to implement LR schedulers here. This is not part of a release yet. For now, you can use LR scheduling like this:

def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
    """Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
    lr = init_lr * (0.1**(epoch // lr_decay_epoch))

    if epoch % lr_decay_epoch == 0:
        print('LR is set to {}'.format(lr))

    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

    return optimizer
10 Likes

I think that the return statement should be return optimizer instead of return optimiser :slight_smile:

1 Like

Yep! Edited it! Damn autocorrect.

1 Like

Nice!
Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:

def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
    """Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
    if epoch % lr_decay_epoch:
        return optimizer
    
    for param_group in optimizer.param_groups:
        param_group['lr'] *= lr_decay
    return optimizer

You don’t need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate :slight_smile:

9 Likes

Hi.

Thanks for the answers.
But how this function would be used and what would be saved to disk besides the model.

Thanks.