When I want to do transfer learning, I set the require_grad = False and only pass the other parameters to the optimizer. But what happens if I only do one of these two steps?
- What happens if I set several modules’ reguire_grad = False but then I pass all net.parameters() to the optimize.
- What happens if I keep the require_grad = True but only pass a subset of the parameters to the optimizer (the subset I want to fine tune, for example).
Thank you @ptrblck
does it mean that (1) and “setting require_grad = False and only passing the other parameters to the optimizer” are equivalent?
They will yield the equivalent results, yes.
Filtering out the parameters is explicit and could thus increase the code readability and will also avoid iterating over parameters without a
grad attribute in the
Approach (1) would allow you to unfreeze the parameters later in training directly and train these parameters without adding these via
I guess it might depend a bit on your coding style and what your exact use case is (e.g. are you planning on unfreezing or not?).
thanks, @ptrblck, as usual very informative and precise answers . Maybe some of this info should be added to this tutorial https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
This might make sense. Would you be interested in creating an issue to discuss the current shortcomings and create a fix for it after the discussion with the code owners?