Finetuning a "unfrozen" model does not preserve accuracy?

This is more of a accuracy/algorithm question. I would appreciate some empirical/theoretical advices.

Hi, I’m trying to pretrain a ViT model on CIFAR100 (100 classes) from a random pretrained checkpoint I got from huggingface model hub. The model was good for Imagenet validation (80+% accuracy).

Because the pretrained weights were trained on Imagenet (1k classes), I had to re-initialize the final classifier layer of the pretrained model and fine-tune for the 100 classes for CIFAR100. Initially I thought only fine-tuning the re-initialized classifier layer and freezing the rest can save a lot of training time. So I went on with this strategy and it worked. I was able to get 80% test accuracy in 20 epochs and the accuracy did not really improve afterwards.

Afterwards, I “un-froze” the ViT base model parameters and finetuned further. My initial thinking was that the model should begin with 80% accuracy and improve with more epochs of finetuning. However, the accuracy tanked at the first epoch. The accuracy trend for this finetuning showed that it was similar to training a ViT from scratch.

My question here is, shouldn’t finetuning an unfrozen model be as good as when the model was partially-frozen? Does unfreezing the partially frozen layers for further finetuning harm accuracy and not help with faster convergence? Should partially unfreezing the model layers be considered as creating a whole new model?

I guess your learning rate was too large to fine tune the entire model and might have catapulted it away from the pretrained parameters. This is a common issue when fine-tuning a model and you might need to experiment with different learning rates and check at which point the accuracy decreases significantly.

Thank you @ptrblck , I should try out different learning rates as you suggested. Meanwhile, I did found some references saying that different layers have different learning sensitivities so unfreezing a model for finetuning could actually be slower than finetuning from scratch. If I can’t find a good learning rate, I think I will just begin from scratch.