How to pass certain layers weights in the optimizer

lets say i have a model

from torchvision import models
model = models.resnet18(pretrained=True)

Now i freeze all the layers

for param in model.parameters():
param.requires_grad = False

and then pass all the models parameters in the optimizer

optimizer = optim.Adam(model.parameters() , lr=0.1)

would the optimzer do nothing, because all the model parameters have requires_grad = False?

thanks in advance :slight_smile:

1 Like


So, what kind of task do you want? You are absolutely correct as all the parameters have their requires_grad set to False.

If you only want to fine tune, you can just change the last layer according to number of output classes that you have.

model = models.resnet18(pretrained=True)
for param in model.parameters():
    param.requires_grad = False
num_filters = model.fc.in_features
model.fc = nn.Linear(num_filters, <no of output classes>)
model =

Rest of the pipeline remains the same.

However, if you want to actually turn on the training for particular layers, it can be a bit tricky

One way of actually keeping a track is you can print the different modules in the resnet model using the following code

for child in model_ft.children():

You will be able to analyze the various blocks.

One step further would be actually choosing the parameters and enabling the requires_grad for that particular layer.

for child in model_ft.children():
     for parameter in child.parameters():
        parameter.requires_grad = True

However you have to carefully make sure that you are enabling the correct parameter.

1 Like

So once the requires_grad have been set to True/False for the specific layers, we can simply pass the whole model in the optimizer, right?


optimizer = optim.Adam(model.parameters() , lr=0.1) ???

even i all the parameters have been passed in the optimizer, only thoses parameters will get updated, which have their requires_grad set to True…

Pls correct me if i am wrong and thanks yaar, for the explanation !!! :smiley:

Yes, that is correct

1 Like

thanks saransh. that was very helpful ! : )

I am glad that I could be of help

1 Like

can u pls tell me how to set requires_grad = False or True depending on the layer name of the model.

i know that the following code sets all the parameters to False or True

for param in model.parameters():
param.requires_grad = False

i want to set requires_grad based on the layer name…

how can that be done.

search for named_parameters…

here’s how it’s done,

print('Training these layers')
for name,param in model.named_parameters():
    if param.requires_grad is True:
        print(name, param.requires_grad)

I hope you can flip the requires_grad as per the need…

Secondly don’t pass model.parameters() like that rather filter the ones which have requires_grad as True… (looks cool)

1 Like

thanks for showing me the named_parameters thing. but can u pls explain in greater details how can i pass the parameters in the optimizer, once i have set the requires_grad as per my needs?

please refer to :

Everything remains the same, except that you actually selectively switch on / off the layers from the pretrained model whose gradients are to be considered in the backward pass.

Try to code it up and see for yourself, you don’t need to manually specify for a particular layer to be used by your optimizer, everything is taken care off by PyTorch.

Yep Exactly! Do what saransh said!

can u pls tell me when where and why do we use save and load the optimizers state_dict ? I mean if i wanna resume models training for a few more epochs, do i really need to load the optimizers state_dict or can i just initialize the optimizer once again ? what exactly does the optimizers state_dict contain ?

optimizer = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr=0.0006, momentum=0.9)

is this what you were talking about? if yes, pls also explain what would happen if pass the parameters as
optimizer = torch.optim.SGD(model.parameters(), lr=0.0006, momentum=0.9) ???

Both are same and the one thing, it’s just what ones prefer over other!

so the only difference is the way the instruction is written right? both are essentially doing the same thing. there wouldnt be any difference in the output