PyTorch Transfer Learning Methodology

How is PyTorch doing the transfer learning? Does it just take the last layer of resnet and pass it to a new neural network classifier for my categories or does it retrain all the way back to resnet?

Also, apart from what is in this link, are there other implementation of Transfer Learning in PyTorch?

It depends on you and your use case, which part of the model will be fine tuned.

Andrej Karpathy explained in Stanford’s CS231 you can use the size and similarity of your data as a baseline:

  1. New dataset is small and similar to original dataset . Since the data is small, it is not a good idea to fine-tune the ConvNet due to overfitting concerns. Since the data is similar to the original data, we expect higher-level features in the ConvNet to be relevant to this dataset as well. Hence, the best idea might be to train a linear classifier on the CNN codes.
  2. New dataset is large and similar to the original dataset . Since we have more data, we can have more confidence that we won’t overfit if we were to try to fine-tune through the full network.
  3. New dataset is small but very different from the original dataset . Since the data is small, it is likely best to only train a linear classifier. Since the dataset is very different, it might not be best to train the classifier form the top of the network, which contains more dataset-specific features. Instead, it might work better to train the SVM classifier from activations somewhere earlier in the network.
  4. New dataset is large and very different from the original dataset . Since the dataset is very large, we may expect that we can afford to train a ConvNet from scratch. However, in practice it is very often still beneficial to initialize with weights from a pretrained model. In this case, we would have enough data and confidence to fine-tune through the entire network.

It worked out well most of the time for me.


And can you please specify which one is using? Thanks for the response. I assume the link I am sharing is using finetuning and is not backproping the result of new last layers classification through the entire pre-trained resnet. Also, from what 1) says, it seems the link is not doing a good job? because we have small dataset and we are fine-tuning a convnet? The PyTorch tutorial on transfer learning and 1) now seems contradictory to me. Can you please explain if I am wrong?

In the tutorial two approaches are shown.
The first part shows how to fine tune the whole model, while the second one uses the pre-trained model as a fixed feature extractor and only trains the new classifier (linear layer).

Well, I wouldn’t see it as not doing a good job. The tutorial is explaining all necessary steps using PyTorch to fine tune your model. The data, accuracy, etc. are not the main focus in my opinion.
Sure, using a small dataset and natural images, you would most likely want to just fine tune the classifier (case 1).

1 Like

the tutorial is backpropagating through the complete network. You can see it replace the last model_ft.fc layer but using all the model_ft.parameters() in optimizer_ft.
It’s a tutorial, so its purpose is more to give a sense of pytorch functionality rather than achieve the SOA on any benchmark.

1 Like

I have trained model A by using transfer learning (FasterRCNN -pretrained model), now I want to use model A as a pretrained model but getting issues.

step1=FasterRCNN to model A
step2= model A to model B

First I am not sure that it is a good approach or not, I am just trying it.

code sample issue

in_features = self._model.roi_heads.box_predictor.cls_score.in_features
# Replace the pre-trained head with a new one (note: +1 because of the __background__ class)


in_features = self._model.roi_heads.box_predictor.cls_score.in_features
AttributeError: 'collections.OrderedDict' object has no attribute 'roi_heads'

Please let me know what I am doing wrong and how to do it.

Thanks in advance :blush:

Based on the error message it seems torch.load('modelA.pth') loads a state_dict (which is an OrderedDict).
If that’s the case (which is also the recommended way of loading a model), you would have to create the model object first and then load the state_dict as seen here:

model = FasterRCNN()
state_dict = torch.load('modelA.pth', map_location='cpu')
in_features = model.roi_heads...
1 Like

If I’ve a dataset about 20k images, but very different from the original dataset which approach should I use?