Model restart with Learning rate decay

Hey everyone,

I’m training an LSTM model, and I am struggling to make it converge. After the first update, the validation loss just kept going up, and never improved in the following 100 epochs, even though I used torch.optim.lr_scheduler.ReduceLROnPlateau.

In my mind, it makes sense that whenever you’d reduce the learning rate you’d go back to the model version that achieved the best validation previously. But I’ve barely seen anyone mentioning this strategy, so I was wondering whether there’s any particular reason for why we shouldn’t do that.