Set requires_grad=False for earlier layers in a model

Kevin_Dong · August 25, 2023, 2:18am

I am trying to fine-tune a pre-trained model by freezing different parts of the model. Say the model has an encoder and decoder. If I only want to train the encoder, I can freeze the decoder by settings its Tensor to be requires_grad=False. However, does it block the gradient flow back to the encoder?

In practice, I found the autograd still computes the gradient for the encoder even the grad for decoder is 0. I am just curious what requires_grad means for Tensor. Is it blocking all gradients or just discarding the records of gradient, even once being computed?

ptrblck · August 25, 2023, 1:45pm

No, it doesn’t and Autograd is smart enough to backpropagate the gradients to earlier parameters.

Autograd will use this attribute to decide if a gradient computation is needed or not.
E.g. freezing the parameter will reduce the wgrad kernels as seen in this example.