Scheduler.step() after each epoch or after each minibatch

jpainam · February 8, 2021, 1:59am

Hi, I defined a exp_lr_scheduler like

exp_lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=40, gamma=0.1)

But was wondering what is the best way to use it? After each epoch or after each minibacth
USE CASE 1.

for epoch in range(num_epoch):
  scheduler.step()
  for img, labels in train_loader:
    .....
    optimizer.zero_grad()
    optimizer.step()

This one give this warning

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`.  
In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()

USE CASE 2

for epoch in range(num_epoch):
  for img, labels in train_loader:
    .....
    optimizer.zero_grad()
    optimizer.step()
  # At the end of the epoch
  scheduler.step()

This way, no warning, since the optimizer.step() is called before scheduler.step(). But scheduler.step() still run for once per epoch
USE CASE 3

for epoch in range(num_epoch):
  for img, labels in train_loader:
    .....
    scheduler.step()  # run for each minibach
    optimizer.zero_grad()
    optimizer.step()

Here, scheduler.step() runs for each minibatch.
What do you think is the right way of using such scheduler.
Thank you.

Dwight_Foster · February 8, 2021, 1:16pm

Use case #2 is the best. That is the one recommended by the pytorch docs. You could use 3 if you set the stepsize really high but usually doing it in minibatches will lower the learning rate too fast.

cgr71ii · July 12, 2022, 1:08pm

Hi!

Is always the best solution to update the LR scheduler when the epoch finishes? There are examples like get_linear_schedule_with_warmup from HuggingFace that it seems to be applied at batch-level (github huggingface/transformers/issues/2375, transformers/run_openai_gpt.py at 594ca6deadb6bb79451c3093641e3c9e5dcfa446 · huggingface/transformers · GitHub). Moreover, I’ve tried to check the fairseq code and, apparently, they apply the LR scheduler once per batch (default configuration if I’m not wrong) instead of once per epoch (fairseq/trainer.py at cba35cdbca8385fe00045796ea731c0296f5434b · facebookresearch/fairseq · GitHub → train_step() is invoked by train() which is invoked by fairseq_train CLI, and train_step() calls set_num_updates() per batch, which updates the number of steps and the LR scheduler), but I’m not really sure since I don’t know very good the code.

I’ve tried to look for the answer, but I’m not sure what is the best solution yet. Or even if it depends on the specific LR scheduler or task…

R-N · October 14, 2023, 7:29am

Bump. I’d like to know too. I see recent large models with very large datasets tend to be trained by step count instead of epoch, and they train for only few epoch overall due to dataset size. If that’s the case then the scheduler step should be called at every step or every n step, no? Is there a rule of thumb, guideline, or something? Or should it just be a hyperparameter to tune?