Passing to the optimizers frozen parameters

When I want to do transfer learning, I set the require_grad = False and only pass the other parameters to the optimizer. But what happens if I only do one of these two steps?

  1. What happens if I set several modules’ reguire_grad = False but then I pass all net.parameters() to the optimize.
  2. What happens if I keep the require_grad = True but only pass a subset of the parameters to the optimizer (the subset I want to fine tune, for example).
7 Likes
  1. The parameters with requires_grad = False won’t accumulate a gradient during the backward pass and their .grad attribute will stay None by default (if no previous gradient was calculated).
    The optimizer will skip all parameters with a None gradient as seen here.

  2. All parameters will accumulate gradients and the optimizer will only update the passed parameters. If you call optimizer.zero_grad() and don’t use model.zero_grad(), the “unused” parameters will continue to accumulate gradients. If you later decide to use these parameters in another optimizer, you should make sure to zero out their gradients before.

7 Likes

Thank you @ptrblck
does it mean that (1) and “setting require_grad = False and only passing the other parameters to the optimizer” are equivalent?

2 Likes

They will yield the equivalent results, yes.

Filtering out the parameters is explicit and could thus increase the code readability and will also avoid iterating over parameters without a grad attribute in the step method.
Approach (1) would allow you to unfreeze the parameters later in training directly and train these parameters without adding these via add_param_group.
I guess it might depend a bit on your coding style and what your exact use case is (e.g. are you planning on unfreezing or not?).

3 Likes

thanks, @ptrblck, as usual very informative and precise answers . Maybe some of this info should be added to this tutorial https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

This might make sense. Would you be interested in creating an issue to discuss the current shortcomings and create a fix for it after the discussion with the code owners? :slight_smile:

1 Like

@ptrblck @Valerio_Biscione did you happen to create an issue to integrate this in the docs? I think this is still not mentioned in the finetuning tutorial mentioned above

No, I did not. Would you be interested in adding information that you think is missing to the docs?

no, I haven’t, sorry