LR schedulers are applied on the train set or validation set? In the case, where paper refers to validation loss specifically does that imply we will use it on the validation set? Do LR schedulers are applied after one mini-batch or after one epoch?
The learning rate scheduler can only be applied during the learning phase, i.e. during training. No learning is done during the validation / evaluation phase, so it doesn’t make sense speaking about LR scheduler here. All that happens during the validation phase is that the model uses the weights it has already learned during training to do some predictions, and compares them with the expected / actual values. The model’s weights do not change during validation.
During training you can use the LR scheduler if you want your LR to change, as opposed to being constant. This is because in many cases, reducing the learning rate and making the optimiser steps smaller and smaller as you’re approaching the loss function minimum can be beneficial, however this depends on your dataset really.
You can apply the LR scheduler after each mini-batch if you want, or you could choose to change your learning rate once per epoch. That depends on your use case and model configuration, in large NLP transformer models I often apply the LR scheduler step after every minibatch. If you have a specific model / use case in mind, then let us know, and people specialised in that specific area would be able to give more specific advice
@AndreaSottana thank you for the explanation. I am using EfficientNet-B0 for a 4-class classification problem. I am trying to replicate a paper, they have stated 'we used the input size of 224×224pixels, train the model at a learning rate of 0.001 and decrease with a factor of 0.1 if the validation score does not decrease after 5 epochs.’ So, in this context should I go for scheduler after one epoch?
I’m afraid I do not have much experience with CNNs / computer vision as models can be quite different from my NLP work. From what you quoted it seems like they said they would decrease the LR after 5 epochs if the validation score is not improving, so I would say changing the LR after an epoch rather than a minibatch would make most sense, but perhaps others more expert in these models might give their view. Let’s see if some CV expert can confirm this
In general, when you train your model for many epochs (say 30-50) then it makes sense to change the LR after one epoch, but in NLP you normally only train for 2-4 epochs as the models can be huge, so you need to make changes to the LR after each mini-batch. Hope this helps.
Some schedulers, e.g. ReduceLROnPlateau expect to track a metric in order to lower the learning rate.
Usually you would split your dataset into training, validation (, and test) and use the validation loss to trigger the learning rate reduction.
The idea behind it is that once your validation loss doesn’t decrease anymore (or not significantly), lowering the learning rate might help decreasing it further.
However, you could use whatever metric you like to use.
PS: other learning rate schedulers do not expect any input and use some other schedule, e.g. epochs as milestones.