Do I need to both set `requires_grad=True` and only pass those parameters I want to update to the optimizer to perform layerwise training?

soham96 · November 12, 2020, 4:17am

Hi, I am trying to train a model with transfer learning. When I set requires_grad=True for a few layers, but I send all my model parameters to the optimizer (by using model.parameters()), I see that even layers with requires_grad=False gets its weights updated. However, if instead I use filter(lambda p: p.requires_grad, net.parameters()) to set my model parameters in the optimizer, the correct weights get updated.

Is this the correct behavior, or is there some error in my code?
If this is the correct behavior, then how does my optimizer update weights when requires_grad=False?

dbp.pat94 · November 12, 2020, 6:26am

Can you share the code snippet?

ptrblck · November 12, 2020, 7:36am

If these layers were previously updated, i.e. they got a valid gradient in previous iterations, are now frozen via .requires_grad=True, optimizers with running stats (such as Adam) can still update these parameters.
To avoid this, you could try to set their .grad attributes to None, which should skip the updates.

soham96 · November 12, 2020, 7:47am

Thanks for the clarification. However, I am not using Adam. I am using vanilla SGD with no momentum.

ptrblck · November 12, 2020, 7:49am

In that case, could you post an executable code snippet to show this behavior, as asked by @dbp.pat94?