When training, I try to freeze one layer in the Net, but got error

optimizer = torch.optim.SGD([{'params':net.parameters() ,'lr':0.1},
                         {'params':net.conv_r1.bias, 'lr':0.0} ], lr=0.1, momentum=0.9)

I use the above code to freeze net.conv_r1.bias, But got:

ValueError: some parameters appear in more than one parameter group

So, what should I do to only freeze net.conv_r1.bias while train all other params in net

THX!:handshake:

Hi there!
I wonder if you just delete the net.params out of the SGD , how could this SGD train the whole net.parameters()?
since it is not in the SGD ‘params’

#SET CONDITIONAL LEARNING RATES IF NECESSARY   
    model_parameters = []
    for n,p in model.named_parameters():
        if n.find('layer_name') != -1:
             model_parameters.append({'params': p, 'lr': LR})

        else:
            model_parameters.append({'params': p, 'lr': LR})

           
    optimizer = torch.optim.SGD(model_parameters,lr=LR, weight_decay=WEIGTH_DECAY) 
1 Like

Thx juan,
But in this way,

optimizer = torch.optim.SGD(model_parameters,lr=LR, weight_decay=WEIGTH_DECAY) 

The lr in the SGD() seems not funtion, right? since you already pre set all the lr for each param

,lr=LR,

That’s right, but I guess it’s a mandatory input.
Anyway you can do it as you want, the fact is that, as you can see, you cannot repeat a paramater. You can manage to do it as you prefer :slight_smile:

1 Like

Thx, I would try your method to solve this problem
thanks for help:handshake::handshake:

Note that, if you have subnetworks, you can apply this way only to one of them calling model.subnetwork.named_parameters(). The way I present the solution if the most general one.

For example this is another solution (equivalent), proposed by ptrblck

optim.SGD([{'params': [param for name, param in model.named_parameters() if 'fc2' not in name]}, {'params': model.fc2.parameters(), 'lr': 5e-3}], lr=1e-2)

filtering by an if statement but without setting the learning rate. This way those parameters without a specifict learning rate will use the global one

1 Like

Wow, thanks to both you and ptrblck, this method seems perfect