"When last_epoch=-1, sets initial lr as lr."

Good day!

Most of learning rate schedulers have in their documentation:

When last_epoch=-1, sets initial lr as lr.

But what does it mean exactly?

Thanks!

The last_epoch argument is used to resume the training after N epochs as seen in this example:

model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5, last_epoch=-1)

for _ in range(6):
    print(optimizer.param_groups[0]["lr"])
    optimizer.step()
    scheduler.step()
# 1.0
# 1.0
# 0.5
# 0.5
# 0.25
# 0.25
print("==============")
opt_sd = optimizer.state_dict()

# resume training
model = nn.Linear(1, 1)
optimizer = torch.optim.SGD(model.parameters(), lr=1.)
optimizer.load_state_dict(opt_sd)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=2, gamma=0.5, last_epoch=6)

for _ in range(6):
    print(optimizer.param_groups[0]["lr"])
    optimizer.step()
    scheduler.step()
    
# 0.125
# 0.0625
# 0.0625
# 0.03125
# 0.03125
# 0.015625

@ptrblck , thanks very much!
(UPD 2.)
Before the next questions, is it expected that 0.125 appears once?
Should we have read “last epoch” as “last finished epoch” and have written last_epoch=5?

Setting last_epoch=5 will continue with:

0.0625
0.0625
0.03125
0.03125
0.015625
0.015625

and entirely skip 0.125. @albanD do you know what the issue in my code snippet is?

@ptrblck ,

  1. Seems like the last_epoch argument just omits last_epoch+1 steps on the scheduler object initialization.
  1. And doesn’t matter how many steps were done before the scheduler (re-)initialization:

So, on any last_epoch argument value, this “scrolling” (“omitting”) starts with “initial learning rate” (literally), i.e. stepping doesn’t change it.

(To check this and the above one, set step_size=3 and variate the last_epoch argument in your code’s second part.)

  1. Keep in mind, the scheduler object initialization does the .step() itself:
  1. Keep in mind, numeratiion starts with 0.

(Programatically, these steps occur in the order: 2., 1., 3.)


The Question is: does

When last_epoch=-1, sets initial lr as lr.

tell about pp. 1. or 2.?

What does it say about instead?

Does the documentation tell about these three statements?

Thanks!

@ptrblck , @albanD , any thoughts? Thanks.

Continued at `torch.optim.lr_scheduler.LRScheduler`: lack of documentation · Issue #120735 · pytorch/pytorch · GitHub.