How can I exclude some parameters in optimizer during training?

Hello.
I want to exclude some parameters in the optimizer during training.

Specifically, i want to freeze the updates of some layers during training.
(however, requires_grad=False is not working.)

Before starting the training, i construct an optimizer

optimizer = torch.optim.SGD(net.parameters(), lr, momentum=momentum, weight_decay=decay, nesterov=True)

scheduler = torch.optim.lr_scheduler.LambdaLR(
    optimizer,
    lr_lambda=lambda step: cosine_annealing(
        step,
        schedule_epoch * len(train_loader),
        1,  # since lr_lambda computes multiplicative factor
        1e-6 / 0.1))

for epoch in range(total_epoch):
  ...
  # update the models

In this scenario. i want to freeze some layers (exactly, front layers) after some epochs.
How can i exclude the updated parameters in above optimizer?

Thank you.

How are you freezing the layers? Freezing is the only way in which you can exclude parameters during training. In your example I see that you have defined your optimizer as checking out all params. While freezing, this is the way to set up your optimizer:

 optim = torch.optim.SGD(filter(lambda p: p.requires_grad, net.parameters()), lr, momentum=momentum, weight_decay=decay, nesterov=True) 

The filter doesn’t offer so much of a change in a simple optimizer with a learning rate but since you are using momentum and weight decay, the params whose requires_grad is set to False will be updating on the basis of these.

2 Likes

If i use the lambda function above,
do not i need to exclude manually the freezing params “during training”?

Or, after i change some parameters’ requires_grad=False,
do i construct above optimizer again?

1 Like

You freeze the parameters manually before training. For example, if i want to freeze the first 3 layers of a Resnet encoder, I use–

for i,child in enumerate(model.children()):
   for k,child_0 in enumerate(child.children()):
      if k<=3:
         for params in child_0.parameters():
            params.requires_grad=False
               print("Frozen {} layer {}".format(k,child_0))
   break

This is a small snippet which i built based on my encoder. You have to come up with your own freezing loop to freeze custom layers. Note that all this happens before training begins.

A very good idea would be to put it just after you have defined the model. After this, you define the optimizer as

optim = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr, momentum=momentum, weight_decay=decay, nesterov=True) 

and you are good to go ! You can use this model in the training loop and the frozen layers will not have their weights updated.

1 Like