Fine-tuning torchvision models

DrOrstxo · April 19, 2019, 3:22pm

Hi everyone,

I am using the pertained models from torchvision and there are 3 cases: feature extraction, fine-tuning pre-trained model, and optimise it yourself.

First case, to extract feature from images:

set model=resnet18(pretrained=True)
set param.requires_grad = False
and optimise only the Fully Connected layer, the last one.

Second case, to fine-tune pre-trained model:

set model=resnet18(pretrained=True)
set param.requires_grad = True
and optimise Convolutional, Pooling and Fully Connected layers

Third case, to optimise model from scratch :

set model=resnet18(pretrained=False)
set param.requires_grad = True
and optimise Convolutional, Pooling and Fully Connected layers

From my understanding, when you optimise parameters in CNN, you do it for Convolution, Pooling and Fully Connected layers. I can not get the difference between the 2nd and 3d cases. For me in both cases you optimise all layers.

Thank you for your considered time.

ptrblck · April 23, 2019, 9:26am

The number of layers you are training in case of fine-tuning depends on the amount of data you have and also how similar the new dataset is to the one used for pretraining (ImageNet in case of torchvision).
Have a look at the Fine-tuning notes of CS231n for a good overview.

DrOrstxo · April 23, 2019, 12:58pm

ptrblck,

Thank you for your response.

I read this article about transfer learning and I understood how it is working. My question was how different is case 2 and and case 3.

Maybe I expressed my question in incomprehensible way. In other words, how it is possible to use pre-trained model and optimise parameters of all layers setting “set param.requires_grad = True” ? If you optimise params of all layers you train your model from scratch so you set “pretrained=False”.

ptrblck · April 23, 2019, 1:21pm

You are fine-tuning all parameters using the pretrained weights as the initial parameters.
The difference is that you won’t initialize them randomly using a torch.nn.init method, but start from the already pretrained parameters.
If you are using a similar dataset to the one used for pretraining, the model parameters will be already quite good. Changing all parameters “a bit” will fine-tune the model so that it hopefully works better for your new dataset.

DrOrstxo · April 23, 2019, 1:44pm

ptrblck,

As always great answer! Now, it is clear.

Thanks for your time and have a great day