I have implemented ReduceLROnPlateau in my code and I have a question about verbose.
When I set verbose=True and patience=1 it gives me the following result:
...
...
Epoch: 9/100
validation_loss: 0.9176
Epoch: 10/100
validation_loss: 0.8771
Epoch 9: reducing learning rate of group 0 to 2.5000e-04.
validation_loss: 0.8286
Epoch: 12/100
validation_loss: 0.8203
Epoch: 13/100
validation_loss: 0.8343
Epoch: 14/100
validation_loss: 0.8316
Epoch 13: reducing learning rate of group 0 to 2.5000e-05.
...
...
What I find strange is that I would expect it to print:
Epoch 10: reducing learning rate of group 0 to 2.5000e-04.
Instead of:
Epoch 9: reducing learning rate of group 0 to 2.5000e-04.
As I see it the learning rate changes at epoch 10 and not at epoch 9? Again at the other verbose message I would expect it to be epoch 14 instead of epoch 13.
You are either manually passing the epoch argument to step (and might be passing the wrong value) or otherwise the internal last_epoch counter will be increased and used instead as seen here. I don’t know when exactly you are calling the learning rate scheduler, so could you post a minimal, executable code snippet showing the epoch mismatch?
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimize, verbose=True)
for epoch in range(n_epochs):
for i, (local_batch, local_labels) in enumerate(training_generator):
local_batch, local_labels = local_batch.to(device), local_labels.to(device)
...
model.eval() # prep model for evaluation
for local_batch, local_labels in validation_generator:
local_batch, local_labels = local_batch.to(device), local_labels.to(device)
output = model(local_batch)
...
# calculate the loss
val_loss = criterion(output, local_labels)
# record validation loss
running_val_loss += val_loss.item()
valid_loss = running_val_loss / len(validation_generator)
scheduler.step(valid_loss)
I tried to update my scheduler.step(valid_loss) with scheduler.step(valid_loss, epoch=epoch+1) and now it prints correctly. But why was the parameter epoch needed?
Epoch 19: reducing learning rate of group 0 to 2.0000e-04.
Epoch: 20/100
...