Passing to the optimizers frozen parameters

Valerio_Biscione · May 29, 2020, 11:37am

When I want to do transfer learning, I set the require_grad = False and only pass the other parameters to the optimizer. But what happens if I only do one of these two steps?

What happens if I set several modules’ reguire_grad = False but then I pass all net.parameters() to the optimize.
What happens if I keep the require_grad = True but only pass a subset of the parameters to the optimizer (the subset I want to fine tune, for example).

ptrblck · May 30, 2020, 12:30pm

The parameters with requires_grad = False won’t accumulate a gradient during the backward pass and their .grad attribute will stay None by default (if no previous gradient was calculated).
The optimizer will skip all parameters with a None gradient as seen here.
All parameters will accumulate gradients and the optimizer will only update the passed parameters. If you call optimizer.zero_grad() and don’t use model.zero_grad(), the “unused” parameters will continue to accumulate gradients. If you later decide to use these parameters in another optimizer, you should make sure to zero out their gradients before.

Valerio_Biscione · May 30, 2020, 2:30pm

Thank you @ptrblck
does it mean that (1) and “setting require_grad = False and only passing the other parameters to the optimizer” are equivalent?

ptrblck · May 30, 2020, 9:51pm

They will yield the equivalent results, yes.

Filtering out the parameters is explicit and could thus increase the code readability and will also avoid iterating over parameters without a grad attribute in the step method.
Approach (1) would allow you to unfreeze the parameters later in training directly and train these parameters without adding these via add_param_group.
I guess it might depend a bit on your coding style and what your exact use case is (e.g. are you planning on unfreezing or not?).

Valerio_Biscione · May 31, 2020, 2:19pm

thanks, @ptrblck, as usual very informative and precise answers . Maybe some of this info should be added to this tutorial https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

ptrblck · June 1, 2020, 12:24am

This might make sense. Would you be interested in creating an issue to discuss the current shortcomings and create a fix for it after the discussion with the code owners?

Giuliano_Albanese · August 15, 2022, 8:59am

@ptrblck @Valerio_Biscione did you happen to create an issue to integrate this in the docs? I think this is still not mentioned in the finetuning tutorial mentioned above

ptrblck · August 15, 2022, 5:05pm

No, I did not. Would you be interested in adding information that you think is missing to the docs?

Valerio_Biscione · August 19, 2022, 4:28pm

no, I haven’t, sorry