How to use transfer learning correctly (model zoo)?

I have a couple things I’d like to ask about the proper usage of the pretrained models offered by pytorch. (I’m trying to build an SSD detection model with a pretrained MobileNetV2 as backbone.)

  1. It is mentioned in the docs that pretrained models expect inputs

to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]

I believe these hardcoded values are the means and stds per channel of images in the ImageNet dataset. However, as I will be training on COCO, should these be changed to the means and stds of the COCO dataset? (or, in general, the current training set?)

  1. I am currently freezing the backbone for the first few epochs, then unfreeze and train the whole model. However, as soon as I set the requires_grad flag in the params of the backbone to True, the loss rises by a significant amount, and then it can’t really recover anymore. What could cause this behaviour and how should I try to solve it?
    Possibilities I am currently considering are:
  • the learning rate is too high for the backbone (as it may have been trained with a lower one), and the large gradients ruin the weights when they get unfrozen.
  • I am not normalizing the inputs correctly

Should I try to lower the learning rate before unfreezing, or perhaps unfreezing progressively, or what other solutions are there?

Many thanks

Yes, these values are taken from ImageNet.
If you are retraining using data from another domain, you could adapt these values (or recalculate them using your dataset). It’s hard to tell, if you’ll see any performance improvement and you would have to run some experiments.

A high learning rate sounds reasonable, so I would also recommend to lower it.

1 Like