Same weight parameters given twice to optimizer

I have the following network

resnet18 = models.resnet18(pretrained=True)
fc_ftrs = resnet18.fc.in_features
resnet18.fc = nn.Linear(fc_ftrs,self.numClasses)

I want to use small learning rate at for the base of my network (finetuning) and different for the fully connected.

If my optimizer is defined as follows :

RMSprop([{resnet18.parameters(), 'lr': 1e-6}, {resnet18.fc.parameters(), 'lr': 5e-4}])

will both learning rates added up for the fully connected layer or will the second override?

[Edit: The original suggestion is broken, my apologies, see blow!]
Did you actually try?
Any recent version of PyTorch should give you an error.

You can get rid of those by using

fc_params = list(resnet18.fc.parameters())
other params = [p for p in resnet18.parameters() if p not in fc_params]

or so.

Best regards


1 Like

I was actually gonna try it, your method is pretty simple!
Thanks will use this!

Hi I tried your approach,
I get the following error

RuntimeError: The size of tensor a (7) must match the size of tensor b (2048) at non-singleton dimension 3

The 7 sized tensor is probably due to the 7x7 convolution in resnet!

For now I have resolved it as list(resnet.parameters())[:-2] for the base parameters
Is the any other suggested approach for this!??

Sorry for posting a wrong solution before.
The reason it does not work as expected is because python’s in tries to use ==, and that will not identify tensors.

Using the parameter names will work (you could also hack around it by keeping a set of p.data_ptr() and filter by that, but that is ugly…):

fc_params = [p for n,p in m.named_parameters() if not n.startswith('fc.')]
other_params = [p for n,p in m.named_parameters() if n.startswith('fc.')]

Best regards


Do you think that instead of raising error parameters appear in more than one parameter group if we support overriding the learning rate it would be simpler?
Like for my use case it woulf have been much simpler, ofcourse it creates possibility of mistakes from user side but in my opinion more positive than negative!

To be honest, I think that is is a very special application where you need this and don’t have it conveniently available.
For example, (I think) the library (Jeremy Howard advocates a graded learning rate for finetuning) library sticks the various modules in a Sequential module and then gets the parameter groups by iterating over the submodules.
The other option is to use the parameter names, there probably are more elegant solutions than the above if you need it in a systematic way.