Adaptive learning rate

How do I change the learning rate of an optimizer during the training phase?

thanks

12 Likes

See next comment to match @apaszke observation

No! This doesnā€™t work anymore! The example has been updated to match the new semantics of state_dict.

If you want to change the LR we recommend reconstructing the optimizer with new parameters.

2 Likes

Oh, ok sorry ! things move fast

def adjust_learning_rate(optimizer, epoch):
    """Sets the learning rate to the initial LR decayed by 10 every 30 epochs"""
    lr = args.lr * (0.1 ** (epoch // 30))
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

29 Likes

Learning rate decay is a common need during model training, right?
So we donā€™t have this in current Pytorch optim?
As you recommend, I wonder reconstructing the optimizer with new parameters would bring in some performance overhead, although it would be very small compared to the whole training time?

3 Likes

@ecolss creating optimizer is nearly free (order of ms), especially that you do it at most once per epoch, and you only often have only few epochs in jobs that take a number of days to complete. Donā€™t optimize prematurely.

4 Likes

@apaszke However if you create a new optimizer, the previous momemtum (or other states) will be discarded.

20 Likes

Thatā€™s the point. Maybe thereā€™s a good way to save current state of optimizer, call the constructor for optimizer with the new value of lr and assign old state values to the new one. In this case, we will have just a copy of old optimizer with lr adjusted.

1 Like

Have you found any way to update learning rate? It is so critical for me to update the lr.

2 Likes

@apaszke @smth

Keras provides two functions which are fairly straightforward to implement, and everyone loves them:

This one reduces LR when gradient is stuck on a plateau for past ā€œX=patienceā€ epochs:

ReduceLROnPlateau(monitor='loss_value', factor=np.sqrt(0.1), cooldown=0, patience=10, min_lr=0.5e-6, verbose=1)

This one stops you from burning up your Amazon AWS $$$ credits if your model is not learning anything after ā€œX=patienceā€ epochs:

EarlyStopping(monitor='accuracy_value', min_delta=0.0001, patience=15, verbose=1)

Would PyTorch be open to adding something like this?

19 Likes

Any update on this thread about learning rate decay?

@apaszke the concern here is definitely about losing the running averages, etc maintained by the optimizer. As you say the cost of constructing it is negligible.

In the models Iā€™m training right now I see an increase in the loss when I construct a new optimizer to decay the learning rate.

1 Like

You can change lr in this way as @trypag mentioned[quote=ā€œtrypag, post:4, topic:320ā€]
for param_group in optimizer.param_groups:
<\t>param_group[ā€˜lrā€™] = lr
[/quote]
no state loss anymore

2 Likes

Hi,

I have asked @apaszke about this on slack. See this.

chsasank Apr 11, 2017 18:11
Hi @apaszke , can you clear how to change LR. You have given different answers:

Is there any way to decay the learning rate for optimisers? (slack)

check out the imagenet example (This uses param_groups)

Adaptive learning rate - #3 by apaszke

If you want to change the LR we recommend reconstructing the optimizer with new parameters.

apaszke Apr 11, 2017 19:01
both ways are ok. second one is simpler, but will clear momentum buffers + if you use ada* optimizers your model might blow up if your default lr is too large
now Iā€™m leaning towards the first one

I hope this clears things up.

Itā€™s likely that the lr was lowered by the optimizer and the new one doesnā€™t know about it, so it applies too large updates for a few iterations. You can always safely modify the parameter_groups dict, but I guess weā€™ll need to figure out a better way.

Yeah, a cleaner method would be appreciated. Iā€™m using @trypagā€™s method now, but it seems brittle to modify the internals of the optimizer and redundant for everyone to have to write this themselves.

1 Like

How about more general situations where it is desired to adaptively adjust any parameters defined in the model? Is it practical to reconstruct optim in every loop? Maybe a mechanism like placeholder?

Thanks!

@lliu25 in that case, define a python function that changes the parameter_groups as you wish (after every iteration or epoch). Placeholders are unnecessary.

3 Likes

@smth True. That does put placeholder extraneous.

hey not to spam but I implemented these two callbacks (and more) which are available at this repository, and can be used in any kind of training loop. See this thread for further discussion on it

2 Likes