Hi, I am trying to train a model with transfer learning. When I set requires_grad=True for a few layers, but I send all my model parameters to the optimizer (by using model.parameters()), I see that even layers with requires_grad=False gets its weights updated. However, if instead I use filter(lambda p: p.requires_grad, net.parameters()) to set my model parameters in the optimizer, the correct weights get updated.
Is this the correct behavior, or is there some error in my code?
If this is the correct behavior, then how does my optimizer update weights when requires_grad=False?
If these layers were previously updated, i.e. they got a valid gradient in previous iterations, are now frozen via .requires_grad=True, optimizers with running stats (such as Adam) can still update these parameters.
To avoid this, you could try to set their .grad attributes to None, which should skip the updates.