When I want to do transfer learning, I set the require_grad = False and only pass the other parameters to the optimizer. But what happens if I only do one of these two steps?
- What happens if I set several modules’ reguire_grad = False but then I pass all net.parameters() to the optimize.
- What happens if I keep the require_grad = True but only pass a subset of the parameters to the optimizer (the subset I want to fine tune, for example).
6 Likes
Thank you @ptrblck
does it mean that (1) and “setting require_grad = False and only passing the other parameters to the optimizer” are equivalent?
2 Likes
They will yield the equivalent results, yes.
Filtering out the parameters is explicit and could thus increase the code readability and will also avoid iterating over parameters without a grad
attribute in the step
method.
Approach (1) would allow you to unfreeze the parameters later in training directly and train these parameters without adding these via add_param_group
.
I guess it might depend a bit on your coding style and what your exact use case is (e.g. are you planning on unfreezing or not?).
3 Likes
thanks, @ptrblck, as usual very informative and precise answers . Maybe some of this info should be added to this tutorial https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
This might make sense. Would you be interested in creating an issue to discuss the current shortcomings and create a fix for it after the discussion with the code owners? 
1 Like
@ptrblck @Valerio_Biscione did you happen to create an issue to integrate this in the docs? I think this is still not mentioned in the finetuning tutorial mentioned above
No, I did not. Would you be interested in adding information that you think is missing to the docs?