So, what kind of task do you want? You are absolutely correct as all the parameters have their requires_grad set to False.
If you only want to fine tune, you can just change the last layer according to number of output classes that you have.
model = models.resnet18(pretrained=True)
for param in model.parameters():
param.requires_grad = False
num_filters = model.fc.in_features
model.fc = nn.Linear(num_filters, <no of output classes>)
model = model.to(device)
Rest of the pipeline remains the same.
However, if you want to actually turn on the training for particular layers, it can be a bit tricky
One way of actually keeping a track is you can print the different modules in the resnet model using the following code
for child in model_ft.children():
print(child)
You will be able to analyze the various blocks.
One step further would be actually choosing the parameters and enabling the requires_grad for that particular layer.
for child in model_ft.children():
for parameter in child.parameters():
parameter.requires_grad = True
However you have to carefully make sure that you are enabling the correct parameter.
thanks for showing me the named_parameters thing. but can u pls explain in greater details how can i pass the parameters in the optimizer, once i have set the requires_grad as per my needs?
Everything remains the same, except that you actually selectively switch on / off the layers from the pretrained model whose gradients are to be considered in the backward pass.
Try to code it up and see for yourself, you don’t need to manually specify for a particular layer to be used by your optimizer, everything is taken care off by PyTorch.
can u pls tell me when where and why do we use save and load the optimizers state_dict ? I mean if i wanna resume models training for a few more epochs, do i really need to load the optimizers state_dict or can i just initialize the optimizer once again ? what exactly does the optimizers state_dict contain ?
is this what you were talking about? if yes, pls also explain what would happen if pass the parameters as optimizer = torch.optim.SGD(model.parameters(), lr=0.0006, momentum=0.9) ???
so the only difference is the way the instruction is written right? both are essentially doing the same thing. there wouldnt be any difference in the output