When I’m loading my pretrained model without freezing or removing any layers and using an lr of 0.001 I’m getting quite good results. Ex:
models.resnet18(pretrained=True)
def get_trainable(model_params):
return (p for p in model_params if p.requires_grad)
optimizer = torch.optim.Adam(
get_trainable(model.parameters()),
lr=0.001,
)
However, when I do freeze layers for feature extraction my doesn’t learn an all. Ex:
def set_parameter_requires_grad(model, feature_extracting):
if feature_extracting:
for param in model.parameters():
param.requires_grad = False
n_classes = 5
set_parameter_requires_grad(model, True)
num_ftrs = model.classifier.in_features
model.classifier = nn.Linear(num_ftrs, n_classes)
There’s a third scenario I just ran into where I don’t freeze anything but specify the number of classes. In this case, my model overfits.
It’s still a bit confusing for me. How are you training the 1st case if you aren’t adding the last layer which maps to the amount of classes of your dataset?
It seems you are loading the network without modifying the final amount of classes. In case you do feature extraction this is ok.
Without modifications you are taking ImageNet weights. Those weights come from a very big dataset with more than 1k classes. That’s why it generalizes very well as feature extractor.
When you freeze the whole model but the final layer, your network is not capable to map those features to your classes.
In the 3rd case it seems your dataset is not big enough and you overfit.
How to solve this?
You can think firsts layers are more general, and the deeper the more specific the learned filters are. You can try to use a smaller LR for the pretrained part, such that those weights aren’t too modified and a normal learning rate for the last FC layer you are adding. You can even freeze first layers as those are truly very general.