How can I add an additional output head without losing performance on original head?

Abanoub_Ghobrial · October 13, 2023, 5:33pm

I found the reason why I was getting these changes even though I had all model parameters frozen. The running mean and standard deviation in the Batch normalisation layers were still updating and these do not freeze when you set requires_grad to False. To stop the batch normalisation layers from changing one could use the code below after setting the model to model.train().

for name, module in model.named_modules():
if isinstance(module, nn.BatchNorm2d):
module.eval()

Alternatively, one may save the batch normalisation layer values and reload them as required.

Hope this helps anyone else as well