Transfer learning - Question about frozen layers

Hi everyone!
I’m trying some experiments in transfer learning using the Alexnet, in particular I tried 3 different modalities for my problem, all of the following have of course the last fully connected layer changed with respect to the number of classes required:

  • I trained the net with the weight from ImageNet without freezing any layer;
  • Same as before, but with the convolutional layers frozen, keeping only the classifier part with requires_grad=True;
  • Same as first row, but with the fully connected layers frozen, keeping only the feature part with requires_grad=True.

Well, to the best of my knowledge, the first point can involve to a previous information loss and achieve a poor result; the second point should be the raccomended situation, in which exploiting the knowledge obtained with ImageNet for the feature extraction, my net learns only the classification weights. Instead, freezing the FCs I guess that again it can involve to a loss of knowledge and the classifier part won’t be well suitable for my task.

Now, the question is… Why do I get almost equal results for both accuracy and loss in the 3 different cases?

It really depends on the task. Your model may just be at the point where it’s already able to do the task without adjusting much weights (hence the frozen components don’t matter). It can also be that the unfrozen components can each still adapt on their own and do just fine. It’s hard to say without knowing the transfer learning task and taking a deep look at how the model is training.

I would agree that your second option is the most traditional way of performing transfer learning. I don’t think the first option is all that bad, but if you have confidence that the features learned by the model is sufficient, then it makes sense to just freeze those layers (and probably gain a training speed-up).

Many thanks for the reply!
So, if I load a net with the ImageNet’s weights is not automatic that training again over a small dataset, will involve to a loss of the previous knowledge, is it right?

To clarify, you’re asking that if we get a pretrained network (without freezing any layers) and train it on a transfer task, will the network get worse (or at least, is it possible)?

I would say not necessarily. I would suspect that early on in training, since the last layer was just freshly initialized, that the pretrained network may get “worst”. But I would imagine this would only be early in training. This is just coming from my personal experiences, so take it as you will.

To clarify, you’re asking that if we get a pretrained network (without freezing any layers) and train it on a transfer task, will the network get worse (or at least, is it possible)?

Yes this is what I mean.

Ok, thanks again I understand.