ReduceLROnPlateau verbose not correct?

Kalle · November 17, 2021, 1:32pm

I have implemented ReduceLROnPlateau in my code and I have a question about verbose.

When I set verbose=True and patience=1 it gives me the following result:

...
...
Epoch: 9/100
validation_loss: 0.9176
Epoch: 10/100
validation_loss: 0.8771
Epoch     9: reducing learning rate of group 0 to 2.5000e-04.
validation_loss: 0.8286
Epoch: 12/100
validation_loss: 0.8203
Epoch: 13/100
validation_loss: 0.8343
Epoch: 14/100
validation_loss: 0.8316
Epoch    13: reducing learning rate of group 0 to 2.5000e-05.
...
...

What I find strange is that I would expect it to print:

Epoch     10: reducing learning rate of group 0 to 2.5000e-04.

Instead of:

Epoch     9: reducing learning rate of group 0 to 2.5000e-04.

As I see it the learning rate changes at epoch 10 and not at epoch 9? Again at the other verbose message I would expect it to be epoch 14 instead of epoch 13.

So what am I missing here?

ptrblck · November 18, 2021, 6:58am

You are either manually passing the epoch argument to step (and might be passing the wrong value) or otherwise the internal last_epoch counter will be increased and used instead as seen here. I don’t know when exactly you are calling the learning rate scheduler, so could you post a minimal, executable code snippet showing the epoch mismatch?

Kalle · November 19, 2021, 12:01pm

So the following is a minimal piece of my code:

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer=optimize, verbose=True)

for epoch in range(n_epochs):
    for i, (local_batch, local_labels) in enumerate(training_generator):
        local_batch, local_labels = local_batch.to(device), local_labels.to(device)
        ...
    model.eval()  # prep model for evaluation
    for local_batch, local_labels in validation_generator:
        local_batch, local_labels = local_batch.to(device), local_labels.to(device)
        output = model(local_batch)
        ...
        # calculate the loss
        val_loss = criterion(output, local_labels)
        # record validation loss
        running_val_loss += val_loss.item()

    valid_loss = running_val_loss / len(validation_generator)
    
    scheduler.step(valid_loss)

I tried to update my scheduler.step(valid_loss) with scheduler.step(valid_loss, epoch=epoch+1) and now it prints correctly. But why was the parameter epoch needed?

Epoch    19: reducing learning rate of group 0 to 2.0000e-04.
Epoch: 20/100
...