Correct way to use(order) scheduler in pytorch 1.6

John_J_Watson · September 23, 2020, 8:45am

I am looking for some guidance on the correct way to use/order scheduler.step() within epochs. So, ofcourse, the official guidance says (torch.optim — PyTorch 2.1 documentation):

PATTERN:0

scheduler = …

>>> for epoch in range(100):
>>>     train(...)
>>>     validate(...)
>>>     scheduler.step()

But then I also see code like this (see for example: Learning Rate Scheduling - Deep Learning Wizard):

PATTERN:1

for epoch in range(num_epochs):
    # Decay Learning Rate
    scheduler.step()
    # Print Learning Rate
    print('Epoch:', epoch,'LR:', scheduler.get_lr())
    for i, (images, labels) in enumerate(train_loader):
        # Load images
        images = images.view(-1, 28*28).requires_grad_()

I also see this ^ similar pattern in quite a few github repos.

So, I did some small scale experiments using these two patterns in Pytorch1.6 and I got slightly better results using PATTERN:1.

So, my questions are:
[1] Does the order matter?
[2] Does the order depend on step size or epoch number or optimizer?

Also, I have another problem: I have two envs with pytorch 1.6 and I set PATTERN:1 for both. In one of the envs I get the warning:

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)
/home/ok/ok0/ok1/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:351: UserWarning: To get the last learning rate computed by the scheduler, please use `get_last_lr()`.
  "please use `get_last_lr()`.", UserWarning)

but in the other env, I dont Cant think why this could be!

Thank you!

tom · September 23, 2020, 9:06am

Yes, use the pattern recommended by the documentation.
No.

The background is that people found out that it doing the LR step first gives unintuitive maths w.r.t. how/when the learning rate changes, so it is strictly recommended to do the LR step last. As always, advice on the internet may be outdated (seems to be the case here, you could help out with filing an issue or PR on their github) or even bad advice (probably not the case here, @ritchieng has been around for a long time and does know his stuff pretty well in my experience, but so I sometimes found stack overflow had really strange recommendations).

Regarding the warning: This might be some more general configuration about warnings on your system. If you believe it is because of PyTorch, you could double-check the versions (torch.__version__) and see in torch.optim to see which path it uses and see if lr_scheduler.py has the warning.

Best regards

Thomas