A problem occured when resuming an optimizer

dreamyun · November 5, 2018, 1:50pm

model=Net()
optimizer1=torch.optim.SGD(model.features.parameters(),lr=0.1,momentum=0.9,dampening=0.9)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer1,
                                                       gamma=0.999,
                                                       last_epoch=100)

Afther runing, a error is shown like this:>---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
in ()
3 scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer1,
4 gamma=0.999,
----> 5 last_epoch=100)

~/anaconda3/envs/pytorch4/lib/python3.6/site-packages/torch/optim/lr_scheduler.py in init(self, optimizer, gamma, last_epoch)
180 def init(self, optimizer, gamma, last_epoch=-1):
181 self.gamma = gamma
–> 182 super(ExponentialLR, self).init(optimizer, last_epoch)
183
184 def get_lr(self):

~/anaconda3/envs/pytorch4/lib/python3.6/site-packages/torch/optim/lr_scheduler.py in init(self, optimizer, last_epoch)
18 if ‘initial_lr’ not in group:
19 raise KeyError("param ‘initial_lr’ is not specified "
—> 20 “in param_groups[{}] when resuming an optimizer”.format(i))
21 self.base_lrs = list(map(lambda group: group[‘initial_lr’], optimizer.param_groups))
22 self.step(last_epoch + 1)

KeyError: “param ‘initial_lr’ is not specified in param_groups[0] when resuming an optimizer”

ptrblck · November 5, 2018, 4:01pm

You are trying to initialize a new optimizer and initialize the scheduler to another last_epoch.
As the optimizer wasn’t used in the scheduler from the beginning, the param_group initial_lr is missing.
What is your exact use case?
Would you like to use the scheduler as if it was already used for 100 epochs?
If so you could set last_epoch=-1 in the instantiation and call the scheduler 100 times in a dummy for loop.

Sergius_Liu · November 1, 2019, 7:57pm

To continue that question, when we initialize a scheduler like

scheduler = torch.optim.lr_scheduler.ExponentialLR(optimizer1,
gamma=0.999,
last_epoch=100)

‘Last_epoch’ is an argument for users which means we can specify it as any number instead of -1.
If we can’t even assign it to other numbers when initialize, isn’t this arg redundant?
I prefer a design that can automatically specify epoch state with ‘last_epoch’ arg.

glenn.jocher · April 23, 2020, 5:24pm

Getting this error my self in https://github.com/ultralytics/yolov3. The band-aid ‘solution’ was to define the attribute after the scheduler is already defined. I’m not sure if the scheduler is actually properly initialized to the correct LR, but the code runs without errors in the second example below:

    # ERROR
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf, last_epoch=start_epoch - 1)

    # NO ERROR
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)
    scheduler.last_epoch = start_epoch - 1

Original error message is:

Traceback (most recent call last):
  File "train.py", line 423, in <module>
    train()  # train normally
  File "train.py", line 152, in train
    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf, last_epoch=start_epoch - 1)
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py", line 189, in __init__
    super(LambdaLR, self).__init__(optimizer, last_epoch)
  File "/usr/local/lib/python3.6/dist-packages/torch/optim/lr_scheduler.py", line 41, in __init__
    "in param_groups[{}] when resuming an optimizer".format(i))
KeyError: "param 'initial_lr' is not specified in param_groups[0] when resuming an optimizer"

Howardchen · May 1, 2020, 4:09pm

Hi @glenn.jocher, I’ve also encountered the same error in https://github.com/ultralytics/yolov3 when resuming training. But I examined the source code of torch.optim.lr_scheduler.LambdaLR, and found that the reason your code works is that LambdaLR takes the default value last_epoch = -1 and reset the lr. Thus lr isn’t loaded from last.pt, but reset as a new optimizer.
https://s0pytorch0org.icopy.site/docs/0.2.0/_modules/torch/optim/lr_scheduler.html

class LambdaLR(_LRScheduler):
    """Sets the learning rate of each parameter group to the initial lr
    times a given function. When last_epoch=-1, sets initial lr as lr.
    ......

Why initial_lr is missing in last.pt is because the optimizer will not be saved in the last epoch. I think that the mechanism of resume training is designed for the scheme that the training is interrupted, not for continuing after the last epoch.
If we want to continue training after the last epoch, we may modify the line
'optimizer': None if final_epoch else optimizer.state_dict()
to
'optimizer': optimizer.state_dict()
in train.py.
I’m not sure but it seems to work for me.

# Save training results
        save = (not opt.nosave) or (final_epoch and not opt.evolve)
        if save:
            with open(results_file, 'r') as f:
                # Create checkpoint
                chkpt = {'epoch': epoch,
                         'best_fitness': best_fitness,
                         'training_results': f.read(),
                         'model': model.module.state_dict() if hasattr(model, 'module') else model.state_dict(),
                         'optimizer': None if final_epoch else optimizer.state_dict()}

glenn.jocher · May 11, 2020, 1:38am

Ah, yes, thanks for the feedback! Yes you are correct, --resume is really only intended for accidentally stopped training. i.e. you train to 300, but your computer shuts down at 100. You can use the same exact training command you originally used plus --resume to finish the training to 300.

If you train to 300/300, and then decide you want to train to 400, you are out of luck, because the LR scheduler has already reduced to near zero, and defining a new number of --epochs will create a nonlinearity in the LR scheduler. In this case you should restart your training from the beginning with --epochs 400.

And to answer your last point, we actually remove the optimizer on purpose after complaints about file sizes, as the optimizer will double the size of the weightfile, since the file is now carrying gradients for each parameter in addition to the parameters themselves.