Performance drop when freeze batchnorm. Why?

jhp · September 3, 2021, 8:21am

Usually, transfer learning freezes the pre-trained model’s BatchNorm running stats (mean and var), so I did that but it achieves worse performance than when it was unfrozen. To be exact, the convergence is fast, but the generalization fails and the performance is not good.
For my knowledge, I think it’s because the domain of the pre-trained model and my dataset are quite different. Is it right to unfreeze running stat in batchnorm in my use case or do other tricks exist for finetuning?
I’m still suspicious that this would make the pretrained model meaningless (cannot prevent unforgetting)