Generally, when we fine-tune a classifier by keeping a pre-trained model as a feature-extractor only, we set the
requires_grad = False for the pre-trained block and only train the newly added FC layer.
For eg., see the code snippet below:-
# Setting up the model # Note that the parameters of imported models are set to requires_grad=True by default res_mod = models.resnet34(pretrained=True) for param in res_mod.parameters(): param.requires_grad = False num_ftrs = res_mod.fc.in_features res_mod.fc = nn.Linear(num_ftrs, 2) res_mod = res_mod.to(device) criterion = nn.CrossEntropyLoss() # Here's another change: instead of all parameters being optimized # only the params of the final layers are being optimized optimizer_ft = optim.SGD(res_mod.fc.parameters(), lr=0.001, momentum=0.9) exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=7, gamma=0.1)
My question is, is it necessary to set
requires_grad=False in fine-tuning because we are anyways specifying the parameters in
optimizer_ft which needs to update i.e. the last FC layer params?
I know it will be a computational disaster and shouldn’t be done this way, but I am just asking out of curiosity.