Freezing intermediate layers while training top and bottom layers

maybe, in my case, I should not be setting requires_grad=False to the L2 parameters, instead I must exclude all L2 parameters from optimizer. That way, right amount of gradients will flow back to L1’s params, but optimizer does not update L2 parameters (which is analogous to freezing L2, yet keeping L1 trainable)

Is this a correct approach :slight_smile: ?