How to perform finetuning in Pytorch?

micklexqg · September 23, 2017, 3:56am

thanks for your code. I have one question. why is there no droprate for dropout? Is there a default value?

D-X-Y · October 18, 2017, 11:08am

@apaszke I have a problem when using require_grad=False with multiple gpus. It’s fine to use require_grad=False for resnet-50 on a single gpu to perform finetuning. The gpu memory saves a lot. But when I use data parallel, the gpu memory cost is same for both require_grad=False and require_grad=True on the first three residual blocks.

oneTaken · October 23, 2017, 6:51am

Yes, there is a default value. You can see the default is 0.5 in the code.

oneTaken · October 23, 2017, 6:54am

@apaszke I saw the tutorial you post. And I have a puzzle.
If the optimizer has the code:

optimizer = optim.SGD(model.fc.parameters(), lr=1e-2, momentum=0.9)

Does it need to add this code the confirm the other layer params don’t require gradient?

for param in model.parameters():
        param.requires_grad = False

oneTaken · October 23, 2017, 7:11am

I read your tutorial carefully.
And I noted that you save the model with ‘.pt’ backend. I don’t get clearly when to save with ‘.pt’ backend and when to save with ‘.pth’ backend.

SpandanMadan · October 23, 2017, 8:16am

Hi,

That’s just the extension of the file and anything will do. You may use .pt or .pth or anything else that you may want to, it doesn’t affect pytorch’s functioning.

oneTaken · October 23, 2017, 8:28am

I got it . I got in trouble with the save name for serval days. Thanks so much.

Besides, I read your tutorial in github along to your post. And I put an issue.

micklexqg · November 1, 2017, 3:21am

I used imagenet/main.py for finetuning alexnet. if is there something to change? I just change the model to be alexnet.
but the result is strange, precsion is very low (below 1 even after many epochs). so how to finetune alexnet?

varghese_alex · November 27, 2017, 7:06am

@apaszke I am trying to fine-tune a resnet18. However I would like to freeze all layer except for the classification layer and the the convolution layer just before the average pooling.

Upon using the following code:
"""
model_ft = models.resnet18(pretrained=True)

lt=8
cntr=0

for child in model_ft.children():
cntr+=1

if cntr < lt:
    # print child
    for param in child.parameters():
        param.requires_grad = False

num_ftrs = model_ft.fc.in_features
model_ft.fc = nn.Linear(num_ftrs,2)

optimizer_ft = optim.SGD(filter(lambda p: p.requires_grad, model_ft.parameters()), lr=0.001, momentum=0.9)

“”

with the code above all the convolution units in the last block gets to trainable mode. How do I make sure that only the last convolution unit within this block, the avg pool and classification layer gets into trainable mode while the rest are frozen? Any suggestions

jdhao · November 28, 2017, 1:47am

@varghese_alex, why do not you format your code so that it is easier to make people read? If you do not take effort to ask good questions. You are less likely to get your answers.

For any late comer who is interested. Your model has a model.parameter() method, which will generate an iterator to your model’s parameter. Convert the parameter iterator to list and then find the index of parameter before which you do not want to fine-tuning. set requires_grad attribute to False for those parameters. I will give a concrete code for vgg16 network, suppose you want to freeze first 2 convolution groups from updating. you can use the following code

# fix the parameter for first 2 blocks of vgg16
for param in list(model.parameters())[:8]:
    param.requires_grad = False

# filter out the parameters you are going to fine-tuing
params = filter(lambda p: p.requires_grad, model.parameters())

# only give parameters which requires grad to optimizer, or you will get an error
# complaining some parameters do not require grad
optimizer = optim.SGD(params, lr=args.lr, momentum=args.momentum, weigth_decay=args.weight_decay)

miguelgfierro · December 17, 2017, 2:03pm

@apaszke I’ve been playing with finetuning vs freezing everything but the last layer, but I find that there is no much difference in the training time (17min finetuning vs 16min training last layer) in Simpsons dataset, whereas the accuracy is quite different (91% vs 62%). Here you have the code and details. I tried other datasets with similar results.

Shouldn’t the freezing example be much faster?

SpandanMadan · August 12, 2018, 8:47am

This fine tuning tutorial (which seems to have helped quite a few people) is not on PyTorch 0.4.

Can someone fork it and add a pull request for the updated PyTorch 0.4? Here’s an issue regarding the same - https://github.com/Spandan-Madan/Pytorch_fine_tuning_Tutorial/issues/7

Let’s keep the resource working for others

Best,
Spandan

3a08078858321b0c74cc · September 1, 2018, 3:57am

There is a typo in last line. it should be

optimizer = optim.SGD(params, lr=args.lr, momentum=args.momentum, weight_decay=args.weight_decay)

MUHAMMAD_SHOAIB · August 27, 2022, 5:31pm

Load a pretrained model - Resnet18

    print("\nLoading resnet18 for finetuning ...\n")
    model_ft = models.resnet18(pretrained=True)

    # Modify fc layers to match num_classes
    num_ftrs = model_ft.fc.in_features
    model_ft.fc = nn.Linear(num_ftrs,num_classes )