Good day!
Most of learning rate schedulers have in their documentation:
When last_epoch=-1, sets initial lr as lr.
But what does it mean exactly?
Thanks!
Good day!
Most of learning rate schedulers have in their documentation:
When last_epoch=-1, sets initial lr as lr.
But what does it mean exactly?
Thanks!
The last_epoch
argument is used to resume the training after N
epochs as seen in this example:
model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5, last_epoch=-1)
for _ in range(6):
print(optimizer.param_groups[0]["lr"])
optimizer.step()
scheduler.step()
# 1.0
# 1.0
# 0.5
# 0.5
# 0.25
# 0.25
print("==============")
opt_sd = optimizer.state_dict()
# resume training
model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
optimizer.load_state_dict(opt_sd)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5, last_epoch=6)
for _ in range(6):
print(optimizer.param_groups[0]["lr"])
optimizer.step()
scheduler.step()
# 0.125
# 0.0625
# 0.0625
# 0.03125
# 0.03125
# 0.015625
@ptrblck , thanks very much!
(UPD 2.)
Before the next questions, is it expected that 0.125
appears once?
Should we have read “last epoch” as “last finished epoch” and have written last_epoch=5
?
Setting last_epoch=5
will continue with:
0.0625
0.0625
0.03125
0.03125
0.015625
0.015625
and entirely skip 0.125
. @albanD do you know what the issue in my code snippet is?
@ptrblck ,
last_epoch
argument just omits last_epoch+1
steps on the scheduler object initialization.So, on any last_epoch
argument value, this “scrolling” (“omitting”) starts with “initial learning rate” (literally), i.e. stepping doesn’t change it.
(To check this and the above one, set step_size=3
and variate the last_epoch
argument in your code’s second part.)
.step()
itself:0
.(Programatically, these steps occur in the order: 2., 1., 3.)
The Question is: does
When last_epoch=-1, sets initial lr as lr.
tell about pp. 1. or 2.?
What does it say about instead?
Does the documentation tell about these three statements?
Thanks!