What is the behavior of passing model parameters with `requires_grad == False` to an optimizer?

viniciusarruda · October 9, 2019, 10:32pm

What is the behavior of passing all or some model parameters with requires_grad == False to an optimizer ?

If I have a model and want to fine tune it. Some layers are frozen (requires_grad == False and .eval()) with pre-trained values. Should I remove the parameters with requires_grad == False to pass to the optimizer ?

ptrblck · October 10, 2019, 3:09pm

All parameters without a .grad attribute will be skipped as shown here. You could avoid this step by filtering out all parameters which don’t require gradients.

viniciusarruda · October 16, 2019, 3:29pm

So it is an option to filter out the parameters which don’t require gradients ?
Since these parameters will be ignored, why should I worry about them (as they will pollute my code as it is useless)?

ptrblck · October 16, 2019, 3:59pm

I would recommend to filter them out, as this would stick to the Python Zen Explicit is better than implicit.

An unwanted side effect of passing all parameters could be that the parameters, which are frozen now, are still being updated, if they required gradients before, and if you are using an optimizer with running estimates, e.g. Adam.