I’m applying transfer learning to the resnet50 from torchvision models, i.e. replaced the last fc layer with 2 neurons for binary classification problem. All other layers are frozen. I started training the network with learning rate 1e-2 and reducing it by a factor of 0.1 after 5 epochs if validation loss is not decreased. However, when the learning rate decreases from 1e-2 to 1e-3 for the first time, there is a suddent and very big increase in training accuracy (therefore decrease in the loss). Training accuracy jumps from around 54% to 96%. I attach the loss and accuracy plots. I use cross entropy loss. What might be the reason of this sudden big increase in the training accuracy?
Your initial learning rate might be too high. Lowering the learning rate might “start” the training.
If your dataset is quite easy (based on the step size it seems you are using very few samples), the model might instantly overfit the training data.
Actually you’re right. My dataset size is not much. I suspected learning rate to be high, therefore started training with learning rate 1e-3. In that case the model doesn’t fully overfit (training accuracy doesn’t reach around 100%), meaning I didn’t observe such dramatic increases. However, validation accuracy also doesn’t increase accordingly. This is also overfit, I guess, because the model can convey what it seems to learn from training set to validation set.
What made me surprised is that after jumping around global minima with high learning rate, one epoch would be enough to reach almost 100%. It didn’t make sense to me. Thanks for your response.