How can I exclude some parameters in optimizer during training?

LeeDoYup · July 23, 2020, 12:10am

Hello.
I want to exclude some parameters in the optimizer during training.

Specifically, i want to freeze the updates of some layers during training.
(however, requires_grad=False is not working.)

Before starting the training, i construct an optimizer

optimizer = torch.optim.SGD(net.parameters(), lr, momentum=momentum, weight_decay=decay, nesterov=True)

scheduler = torch.optim.lr_scheduler.LambdaLR(
    optimizer,
    lr_lambda=lambda step: cosine_annealing(
        step,
        schedule_epoch * len(train_loader),
        1,  # since lr_lambda computes multiplicative factor
        1e-6 / 0.1))

for epoch in range(total_epoch):
  ...
  # update the models

In this scenario. i want to freeze some layers (exactly, front layers) after some epochs.
How can i exclude the updated parameters in above optimizer?

Thank you.

Hmrishav_Bandyopadhy · July 23, 2020, 2:50pm

How are you freezing the layers? Freezing is the only way in which you can exclude parameters during training. In your example I see that you have defined your optimizer as checking out all params. While freezing, this is the way to set up your optimizer:

 optim = torch.optim.SGD(filter(lambda p: p.requires_grad, net.parameters()), lr, momentum=momentum, weight_decay=decay, nesterov=True)

The filter doesn’t offer so much of a change in a simple optimizer with a learning rate but since you are using momentum and weight decay, the params whose requires_grad is set to False will be updating on the basis of these.

LeeDoYup · July 24, 2020, 6:55am

If i use the lambda function above,
do not i need to exclude manually the freezing params “during training”?

Or, after i change some parameters’ requires_grad=False,
do i construct above optimizer again?

Hmrishav_Bandyopadhy · July 24, 2020, 7:15am

You freeze the parameters manually before training. For example, if i want to freeze the first 3 layers of a Resnet encoder, I use–

for i,child in enumerate(model.children()):
   for k,child_0 in enumerate(child.children()):
      if k<=3:
         for params in child_0.parameters():
            params.requires_grad=False
               print("Frozen {} layer {}".format(k,child_0))
   break

This is a small snippet which i built based on my encoder. You have to come up with your own freezing loop to freeze custom layers. Note that all this happens before training begins.

A very good idea would be to put it just after you have defined the model. After this, you define the optimizer as

optim = torch.optim.SGD(filter(lambda p: p.requires_grad, model.parameters()), lr, momentum=momentum, weight_decay=decay, nesterov=True)

and you are good to go ! You can use this model in the training loop and the frozen layers will not have their weights updated.