I would like to fine-tune partial parameters, so I set the requires_grad flag of parameters that I want to keep fixed to be Flase. But I find it does not work. It still optimize all parameters.
How are you sure that it optimizes all parameters?
That is a good question! My model has two branches which only share shallow layers. When I fine tune the second branch, I set the requires_grad flag of first branch parameters False. Then I find the output of first branch is not the same when testing.
Is it possible to share some code to reproduce this?
Also, please mention which pytorch version are you using.