I have a couple things I’d like to ask about the proper usage of the pretrained models offered by pytorch. (I’m trying to build an SSD detection model with a pretrained MobileNetV2 as backbone.)
- It is mentioned in the docs that pretrained models expect inputs
to be loaded in to a range of [0, 1] and then normalized using
mean = [0.485, 0.456, 0.406]
andstd = [0.229, 0.224, 0.225]
I believe these hardcoded values are the means and stds per channel of images in the ImageNet dataset. However, as I will be training on COCO, should these be changed to the means and stds of the COCO dataset? (or, in general, the current training set?)
- I am currently freezing the backbone for the first few epochs, then unfreeze and train the whole model. However, as soon as I set the requires_grad flag in the params of the backbone to True, the loss rises by a significant amount, and then it can’t really recover anymore. What could cause this behaviour and how should I try to solve it?
Possibilities I am currently considering are:
- the learning rate is too high for the backbone (as it may have been trained with a lower one), and the large gradients ruin the weights when they get unfrozen.
- I am not normalizing the inputs correctly
Should I try to lower the learning rate before unfreezing, or perhaps unfreezing progressively, or what other solutions are there?
Many thanks