Freezing layers vs not giving parameters to optimizer

Instead of freezing the layers as per documents (using require_grad = False), would it equivalent to pass just the trainable parameters and not specify the “frozen” parameters.

I imagine this case should would work as expected:
Frozen Layer (not given to optimizer) -> Trainable Layer -> OUTPUT

But I am unsure if this would be OK:
Trainable Layer -> Frozen Layer (not given to optimizer) -> OUTPUT

I am unsure because gradients will accumulate still (?) in the frozen layer during the loss backpropagation, and would this effect the ability of the optimizer to properly optimize the trainable layer?

I believe those two methods (using requires_grad=False and not giving the parameters to the optimizer) are very similar, if not the same in many cases.

Yes, gradients will accumulate in the frozen layer if you don’t pass the frozen layer’s parameters to the optimizer. The effects of this depends on how the optimizer works; for something like SGD, for each Variable/Parameter, the optimizer subtracts its gradient from its data. In this case, each Variable/Parameter’s update is independent of the others, so the two methods of freezing a layer are equivalent.

If an optimizer uses knowledge about other parameters to update one parameter, then yes, this could be a problem.

1 Like