Passing to the optimizers frozen parameters

When I want to do transfer learning, I set the require_grad = False and only pass the other parameters to the optimizer. But what happens if I only do one of these two steps?

  1. What happens if I set several modules’ reguire_grad = False but then I pass all net.parameters() to the optimize.
  2. What happens if I keep the require_grad = True but only pass a subset of the parameters to the optimizer (the subset I want to fine tune, for example).
  1. The parameters with requires_grad = False won’t accumulate a gradient during the backward pass and their .grad attribute will stay None by default (if no previous gradient was calculated).
    The optimizer will skip all parameters with a None gradient as seen here.

  2. All parameters will accumulate gradients and the optimizer will only update the passed parameters. If you call optimizer.zero_grad() and don’t use model.zero_grad(), the “unused” parameters will continue to accumulate gradients. If you later decide to use these parameters in another optimizer, you should make sure to zero out their gradients before.


Thank you @ptrblck
does it mean that (1) and “setting require_grad = False and only passing the other parameters to the optimizer” are equivalent?


They will yield the equivalent results, yes.

Filtering out the parameters is explicit and could thus increase the code readability and will also avoid iterating over parameters without a grad attribute in the step method.
Approach (1) would allow you to unfreeze the parameters later in training directly and train these parameters without adding these via add_param_group.
I guess it might depend a bit on your coding style and what your exact use case is (e.g. are you planning on unfreezing or not?).


thanks, @ptrblck, as usual very informative and precise answers . Maybe some of this info should be added to this tutorial

This might make sense. Would you be interested in creating an issue to discuss the current shortcomings and create a fix for it after the discussion with the code owners? :slight_smile:

1 Like

@ptrblck @Valerio_Biscione did you happen to create an issue to integrate this in the docs? I think this is still not mentioned in the finetuning tutorial mentioned above

No, I did not. Would you be interested in adding information that you think is missing to the docs?

no, I haven’t, sorry