I am also porting ReduceLROnPlateau. This should be pretty light and straight forward. https://github.com/Jiaming-Liu/pytorch-lr-scheduler
I know that it is âBetter to Remain Silent and Be Thought a Fool than to Speak and Remove All Doubtâ but I still donât know how to save the optimizer internals along with the method so everything resumes as if nothing had happened.
I would appreciate an answer with code.
Thanks.
A PR has been merged to implement LR schedulers here. This is not part of a release yet. For now, you can use LR scheduling like this:
def exp_lr_scheduler(optimizer, epoch, init_lr=0.001, lr_decay_epoch=7):
"""Decay learning rate by a factor of 0.1 every lr_decay_epoch epochs."""
lr = init_lr * (0.1**(epoch // lr_decay_epoch))
if epoch % lr_decay_epoch == 0:
print('LR is set to {}'.format(lr))
for param_group in optimizer.param_groups:
param_group['lr'] = lr
return optimizer
I think that the return statement should be return optimizer
instead of return optimiser
Yep! Edited it! Damn autocorrect.
Nice!
Another thing is that this scheduler can be used only if all groups have the same learning rate.
I would change the method to:
def exp_lr_scheduler(optimizer, epoch, lr_decay=0.1, lr_decay_epoch=7):
"""Decay learning rate by a factor of lr_decay every lr_decay_epoch epochs"""
if epoch % lr_decay_epoch:
return optimizer
for param_group in optimizer.param_groups:
param_group['lr'] *= lr_decay
return optimizer
You donât need the initial learning rate as a parameter here because the optimizer already has the initial learning rate passed in the constructor. So all the learning rates are already initialized with the initial_learning_rate
Hi.
Thanks for the answers.
But how this function would be used and what would be saved to disk besides the model.
Thanks.
For an example on LR scheduling and saving the model, you can refer to imagenet example or transfer learning tutorial.
Thanks.
I will study them.
So the answer seems to save the network and the optimizer.
I will try it.
Thanks.
I would love to see these functions get added to master, probably doesnt make sense for everyone to keep rewriting this on their own.
They are. See PR #1370.
Any update on this?
What would be the correct way to do so in the latest version (0.2)?
Thank You.
@apaszke why reconstructing the optimizer instead of updating optimizer.param_groups ?
Reconstructing the optimizer, i.e. optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate), can create a spike in the loss functionâŚ
What about using the load_state_dict on a modified version of it?
def adjust_learning_rate(optimizer, iter, each):
# sets the learning rate to the initial LR decayed by 0.1 every 'each' iterations
lr = args.lr * (0.1 ** (iter // each))
state_dict = optimizer.state_dict()
for param_group in state_dict['param_groups']:
param_group['lr'] = lr
optimizer.load_state_dict(state_dict)
return lr
apologies for bumping this old thread. Putting this here for future reference. The way to do this would be using the lr_scheduler (torch.optim â PyTorch 2.1 documentation)
Along with that, it seems, options like this are available.
So the learning rate is stored in
optim.param_groups[i]['lr']
.optim.param_groups
is a list of the different weight groups which can have different learning rates. Thus, simply doing:
for g in optim.param_groups:
g['lr'] = 0.001
Hey, how to do this with libtorch in C++? I have no param_groups in the optimizer