Set requires_grad=False for earlier layers in a model

ptrblck · August 25, 2023, 1:45pm

No, it doesn’t and Autograd is smart enough to backpropagate the gradients to earlier parameters.

Autograd will use this attribute to decide if a gradient computation is needed or not.
E.g. freezing the parameter will reduce the wgrad kernels as seen in this example.