Changing `requires_grad` during training

MinkyuChoi · January 27, 2019, 6:51am

I understand that the tensors with requires_grad=False do not calculate gradients. I wonder if I can change the requires_grad flag during training so as to train/freeze parts of network during training. I want to train some parameters if some conditions are met and freeze them if conditions are not matched. An example code is like below. When I change this flag, the optimizer would reflect it automatically? or do I have to do something to let the optimizer know about it?

....
optimizer = optim.Adam(model.parameters(), lr = lr)
....
for epoch in range(total_epochs):
    .... training goes on ....

    if certain_condition:
        certain_parameter.requires_grad=False 
        # 'certain_parameter' is model parameter in model.parameters(). 
    else:
        certain_parameter.requires_grad=True

    .... training goes on ....

albanD · January 28, 2019, 10:33am

Hi,

This will work to freeze the gradients to 0.
Be careful though that many optimizer will still update the weights even for a gradient of 0! If you have regularization or momentum for example.

yifita · April 12, 2019, 8:09am

thanks! I had the same question.
what do you suggest in case one wants to completely freeze the parameters?

albanD · April 19, 2019, 5:16pm

The best way is not to give them to the optimizer in the first place I think.
Or if you want to update them sometimes, have two optimizers, one that you step() at every iteration and one that you step() only when you want to update these special elements.

DoubtWang · April 20, 2019, 2:18am

However, Is there an easier way When encountering the following situation?
model has three parameters (parameter_1, parameter_2 and parameter_3),
first, model need update all parameters.
During the training process, parameter_1 and parameter_2 will be updated sometimes, sometimes parameter_1 and parameter_3.
So, we need three optimizers:
optimizer_1—>parameter_1, parameter_2 and parameter_3
optimizer_2—>parameter_1 and parameter_2
optimizer_3—>parameter_1 and parameter_3
But how to deal with when there are more parameters ?
I hope my expression is clear.

albanD · April 20, 2019, 7:17am

Hi,

You don’t want to have parameters shared in multiple optimizers as internal states kept for example by Adam or momentum will be wrong.
You can create one optimizer for each set of paremeters and .step() all the ones that need to be updated at that iterations.

DoubtWang · April 20, 2019, 7:39am

I know your means.
However,
the number of optimizers will increase when the set of paremeters is too large, which will cause some complexity.

baromri · December 25, 2019, 1:37pm

@albanD, I add the same issue and when setting param.grad = None it force the optimizer to skip this param update at step. Thus, also optimizers with momentum will keep those layers params fixed.