Set requires_grad=False for earlier layers in a model

No, it doesn’t and Autograd is smart enough to backpropagate the gradients to earlier parameters.

Autograd will use this attribute to decide if a gradient computation is needed or not.
E.g. freezing the parameter will reduce the wgrad kernels as seen in this example.

2 Likes