Adaptive learning rate

davidenitti · February 3, 2017, 12:12pm

How do I change the learning rate of an optimizer during the training phase?

thanks

trypag · February 3, 2017, 12:24pm

See next comment to match @apaszke observation

apaszke · February 3, 2017, 1:32pm

No! This doesn’t work anymore! The example has been updated to match the new semantics of state_dict.

If you want to change the LR we recommend reconstructing the optimizer with new parameters.

trypag · February 3, 2017, 2:29pm

Oh, ok sorry ! things move fast

def adjust_learning_rate(optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

ecolss · March 11, 2017, 2:04am

Learning rate decay is a common need during model training, right?
So we don’t have this in current Pytorch optim?
As you recommend, I wonder reconstructing the optimizer with new parameters would bring in some performance overhead, although it would be very small compared to the whole training time?

apaszke · March 11, 2017, 10:27am

@ecolss creating optimizer is nearly free (order of ms), especially that you do it at most once per epoch, and you only often have only few epochs in jobs that take a number of days to complete. Don’t optimize prematurely.

ruotianluo · March 12, 2017, 1:47am

@apaszke However if you create a new optimizer, the previous momemtum (or other states) will be discarded.

lysuhin · March 27, 2017, 11:48am

That’s the point. Maybe there’s a good way to save current state of optimizer, call the constructor for optimizer with the new value of lr and assign old state values to the new one. In this case, we will have just a copy of old optimizer with lr adjusted.

mderakhshani · March 29, 2017, 4:13pm

Have you found any way to update learning rate? It is so critical for me to update the lr.

FuriouslyCurious · March 31, 2017, 3:19am

@apaszke @smth

Keras provides two functions which are fairly straightforward to implement, and everyone loves them:

This one reduces LR when gradient is stuck on a plateau for past “X=patience” epochs:

ReduceLROnPlateau(monitor='loss_value', factor=np.sqrt(0.1), cooldown=0, patience=10, min_lr=0.5e-6, verbose=1)

This one stops you from burning up your Amazon AWS $$$ credits if your model is not learning anything after “X=patience” epochs:

EarlyStopping(monitor='accuracy_value', min_delta=0.0001, patience=15, verbose=1)

Would PyTorch be open to adding something like this?

ecolss · April 7, 2017, 2:48am

Any update on this thread about learning rate decay?

will · April 12, 2017, 12:45am

@apaszke the concern here is definitely about losing the running averages, etc maintained by the optimizer. As you say the cost of constructing it is negligible.

In the models I’m training right now I see an increase in the loss when I construct a new optimizer to decay the learning rate.

Rongzhao_Zhan · April 12, 2017, 4:10am

You can change lr in this way as @trypag mentioned[quote=“trypag, post:4, topic:320”]
for param_group in optimizer.param_groups:
<\t>param_group[‘lr’] = lr
[/quote]
no state loss anymore

chsasank · April 12, 2017, 5:32am

Hi,

I have asked @apaszke about this on slack. See this.

chsasank Apr 11, 2017 18:11
Hi @apaszke , can you clear how to change LR. You have given different answers:

Is there any way to decay the learning rate for optimisers? (slack)

check out the imagenet example (This uses param_groups)

Adaptive learning rate - #3 by apaszke

If you want to change the LR we recommend reconstructing the optimizer with new parameters.

apaszke Apr 11, 2017 19:01
both ways are ok. second one is simpler, but will clear momentum buffers + if you use ada* optimizers your model might blow up if your default lr is too large
now I’m leaning towards the first one

I hope this clears things up.

apaszke · April 12, 2017, 2:12pm

It’s likely that the lr was lowered by the optimizer and the new one doesn’t know about it, so it applies too large updates for a few iterations. You can always safely modify the parameter_groups dict, but I guess we’ll need to figure out a better way.

will · April 12, 2017, 5:13pm

Yeah, a cleaner method would be appreciated. I’m using @trypag’s method now, but it seems brittle to modify the internals of the optimizer and redundant for everyone to have to write this themselves.

lliu25 · April 14, 2017, 1:56pm

How about more general situations where it is desired to adaptively adjust any parameters defined in the model? Is it practical to reconstruct optim in every loop? Maybe a mechanism like placeholder?

Thanks!

smth · April 16, 2017, 9:23pm

@lliu25 in that case, define a python function that changes the parameter_groups as you wish (after every iteration or epoch). Placeholders are unnecessary.

lliu25 · April 16, 2017, 10:52pm

@smth True. That does put placeholder extraneous.

ncullen93 · April 21, 2017, 12:36am

hey not to spam but I implemented these two callbacks (and more) which are available at this repository, and can be used in any kind of training loop. See this thread for further discussion on it