ReduceLROnPlateau not doing anything?

cod3licious · September 5, 2018, 9:25pm

I’m trying to use the ReduceLROnPlateau scheduler but it doesn’t do anything, i.e. not decrease the learning rate after my loss stops decreasing (and actually starts to increase over multiple epochs quite a bit).

Here is the code:

        criterion = nn.MSELoss()
        optimizer = optim.Adam(self.model.parameters(), lr=lr, weight_decay=weight_decay)
        lr_scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, verbose=True)

        for epoch in range(epochs):
            running_loss = 0.0
            for i, data in enumerate(trainloader):
                x_batch, s_batch = data
                x_batch, s_batch = x_batch.to(self.device), s_batch.to(self.device)
                optimizer.zero_grad()
                outputs = self.model(x_batch)
                loss = criterion(outputs, s_batch)
                running_loss += loss.item()
                loss.backward()
                optimizer.step()
            lr_scheduler.step(running_loss)

What am I missing?

ptrblck · September 5, 2018, 9:30pm

How many epochs do you have?
The default value for patience is set to 10.
Maybe you have to lower it a bit?

cod3licious · September 5, 2018, 9:38pm

no, I’m using 25 epochs and the loss stops decreasing around epoch 10 or 11 so that should be fine…

cod3licious · September 5, 2018, 9:43pm

ok, super weird. I tried it and set patience to 0 and it worked - at epoch 11 it decreased the LR! I feel like this might be a bug?

ptrblck · September 5, 2018, 9:48pm

Here is a small dummy example and it seems to work:

model = nn.Linear(10, 2)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, patience=10, verbose=True)

for i in range(25):
    print('Epoch ', i)
    scheduler.step(i)

Setting patience=0 reduces the lr until eps is met.

cod3licious · September 5, 2018, 9:52pm

here is my output from training, this was with patience=1:

[0.0005]
[epoch 1] loss: 301427.7770613
[0.0005]
[epoch 2] loss: 160481.1660669
[0.0005]
[epoch 3] loss: 117763.6467930
[0.0005]
[epoch 4] loss: 81528.4339311
[0.0005]
[epoch 5] loss: 61726.3664427
[0.0005]
[epoch 6] loss: 46429.0652291
[0.0005]
[epoch 7] loss: 35138.6479489
[0.0005]
[epoch 8] loss: 23490.1695382
[0.0005]
[epoch 9] loss: 17156.3515503
[0.0005]
[epoch 10] loss: 13263.6756480
[0.0005]
[epoch 11] loss: 10980.2816269
[0.0005]
[epoch 12] loss: 11231.1720589
[0.0005]
[epoch 13] loss: 7322.0585567
[0.0005]
[epoch 14] loss: 9150.4534486
Epoch    14: reducing learning rate of group 0 to 5.0000e-05.
[5e-05]
[epoch 15] loss: 12504.9455976
[5e-05]
[epoch 16] loss: 935.6753716
[5e-05]
[epoch 17] loss: 793.3272605
[5e-05]
[epoch 18] loss: 719.4522403

from my understanding (or what I want) is for the LR to be decreased after epoch 12 already, where the loss starts to increase. Why isn’t that the case? Does that have to do with the relative threshold? I found this part pretty confusing in the documentation…

if I have patience=10, then this is the output, i.e., the LR does not get reduced:

[0.0005]
[epoch 1] loss: 301427.7770613
[0.0005]
[epoch 2] loss: 160481.1660669
[0.0005]
[epoch 3] loss: 117763.6467930
[0.0005]
[epoch 4] loss: 81528.4339311
[0.0005]
[epoch 5] loss: 61726.3664427
[0.0005]
[epoch 6] loss: 46429.0652291
[0.0005]
[epoch 7] loss: 35138.6479489
[0.0005]
[epoch 8] loss: 23490.1695382
[0.0005]
[epoch 9] loss: 17156.3515503
[0.0005]
[epoch 10] loss: 13263.6756480
[0.0005]
[epoch 11] loss: 10980.2816269
[0.0005]
[epoch 12] loss: 11231.1720589
[0.0005]
[epoch 13] loss: 7322.0585567
[0.0005]
[epoch 14] loss: 9150.4534486
[0.0005]
[epoch 15] loss: 12504.9455976
[0.0005]
[epoch 16] loss: 10636.9359541
[0.0005]
[epoch 17] loss: 8429.2563446
[0.0005]
[epoch 18] loss: 5167.7016573
[0.0005]
[epoch 19] loss: 7893.0735840
[0.0005]
[epoch 20] loss: 7127.2575889
[0.0005]
[epoch 21] loss: 8848.1689746
[0.0005]
[epoch 22] loss: 7766.2236686
[0.0005]
[epoch 23] loss: 8599.2677275
[0.0005]
[epoch 24] loss: 9155.7630087
[0.0005]
[epoch 25] loss: 7540.2192680

so patience influences the threshold?!

ptrblck · September 5, 2018, 10:09pm

As far as I see the losses, the minimum seems to be in epoch18 with 5167.
Even though the loss was a bit shaky before that, the min loss was still lowered before the patience was satisfied. After epoch 18 you would have to stay over 5167 for the next 10 epochs.
If you want to change this behavior, you could also play around with cooldown:

cooldown ( int ) – Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.

cod3licious · September 5, 2018, 10:18pm

But since cooldown is by default 0, there shouldn’t be any waiting. For patience = 10 the loss decreases until epoch 11 (10980) and then increases in epoch 12 (11231) so I would assume that after this increase the LR is reduced (since the scheduler has no idea that a better minimum will come along later). But the loss value at which point the LR is decreased depends on the patience, even though, from my understanding, patience should only determine after how many epochs the reducing starts, not how much worse the value has to get for it to start reducing (which, as I understand, should be controlled by threshold). So this seems like a bug, no?

To expand on your minimal example:

model = nn.Linear(10, 2)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(
    optimizer, patience=10, verbose=True)

test_loss = [301427.7770613, 160481.1660669, 117763.6467930, 81528.4339311, 61726.3664427, 46429.0652291, 35138.6479489, 23490.1695382, 17156.3515503, 13263.6756480, 10980.2816269, 11231.1720589, 7322.0585567, 9150.4534486, 12504.9455976, 10636.9359541, 8429.2563446, 5167.7016573, 7893.0735840, 7127.2575889, 8848.1689746, 7766.2236686, 8599.2677275, 9155.7630087, 7540.2192680]

for i in test_loss:
    print('loss: ', i)
    scheduler.step(i)

–> with patience 10, the LR wont be reduced at all, with patience 0 it is reduced after epoch 11, with patience 5 after epoch 23…

cod3licious · September 5, 2018, 10:42pm

ok, I couldn’t help myself and created an issue: https://github.com/pytorch/pytorch/issues/11305 this just seems too weird to work as intended like this…

ptrblck · September 5, 2018, 10:44pm

The patience is applied to the last minimal loss value and the subsequent values.
Let’s analyze the behavior for patience=0:

Until epoch10 the loss is decreasing (starting with epoch0).
The loss in epoch11 increases; since patience=0, we are decreasing the lr. The current min value is 10980 from epoch9.
The loss decreases to 7322 in epoch 12, which is our new minimum
After it the loss increases in every epoch and the lr is also decreased in every epoch until eps is met.

Let’s now have a look at patience=5:

Again the loss decreases until epoch10=10980, which is our current minimum
The loss increases in epoch11; our “current” patience is now 4; the current minimum stays the same.
Epoch 12 yields a new minimum of 7322. Patience is back to 5.
The loss increases in the next 4 epochs. Patience is lowering to 4, 3, 2, 1.
Epoch17 yields a new minimum of 5167. Patience is back to 5.
The loss increases for the rest of the epoch. After 5 increases, the “current” patience is at 0 and the lr will be decreased.

cod3licious · September 5, 2018, 10:52pm

aaaahhh… ok, I thought patience only applies to the initial epochs, like, “let the model fool around for the first X epochs, then start monitoring and decreasing the LR”. Sorry, this clears things up! A default value of 10 seems a bit high though then.

Thanks so much!

ptrblck · September 5, 2018, 10:56pm

No worries. I just realized your point of view a few moments ago.
Yeah, it might be a bit high, but it also depends on your dataset etc. so I think it’s a valid base value.

If you want to skip some initial epochs, you might want to add a condition like:

if epoch > 10:
    scheduler.step(loss)

to your training loop.

learner47 · November 11, 2020, 11:35am

Hi @ptrblck, after the patience runs out and the lr is decreased to a minimum value allowed by eps, is there a chance that lr could go back to its original value if the loss begins to decrease? Or is it the case that, once eps is hit, it would be stuck at that lr forever?

ptrblck · November 11, 2020, 11:49am

I think the ReduleLROnPlateau would only reduce the learning rate, so it would be stuck at this value.
However, you could adapt this scheduler to increase the learning rate again using a similar approach.

alirezakazemipour · January 1, 2021, 5:40pm

Your commitment to support is unbelievable!
Thank you.

paepcke · February 22, 2021, 11:51pm

Agreeing with alirezakazemipour: I am learning from your posts every day. Thank you for being so active and patient.

Kimonili · September 8, 2021, 7:00pm

Hi @ptrblck

Just a follow-up question on top of this discussion. Does the learning rate get reduced even if the difference between 2 epochs is very very small (for a patience = 0)?

Or there is a condition based on which the difference between two epochs should be higher than some value?

ptrblck · September 8, 2021, 8:00pm

The used threshold defines the needed change to trigger the learning rate reduction:

threshold (float) – Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.

Kimonili · September 8, 2021, 8:41pm

Yeah I also checked the source code of class to make sure how it is calculated. I think it has to do with the threshold mode as well.

def is_better(self, a, best):
        if self.mode == 'min' and self.threshold_mode == 'rel':
            rel_epsilon = 1. - self.threshold
            return a < best * rel_epsilon

        elif self.mode == 'min' and self.threshold_mode == 'abs':
            return a < best - self.threshold

        elif self.mode == 'max' and self.threshold_mode == 'rel':
            rel_epsilon = self.threshold + 1.
            return a > best * rel_epsilon

        else:  # mode == 'max' and epsilon_mode == 'abs':
            return a > best + self.threshold

Thank you @ptrblck !!