Set requires_grad=False for earlier layers in a model

I am trying to fine-tune a pre-trained model by freezing different parts of the model. Say the model has an encoder and decoder. If I only want to train the encoder, I can freeze the decoder by settings its Tensor to be requires_grad=False. However, does it block the gradient flow back to the encoder?

In practice, I found the autograd still computes the gradient for the encoder even the grad for decoder is 0. I am just curious what requires_grad means for Tensor. Is it blocking all gradients or just discarding the records of gradient, even once being computed?

No, it doesn’t and Autograd is smart enough to backpropagate the gradients to earlier parameters.

Autograd will use this attribute to decide if a gradient computation is needed or not.
E.g. freezing the parameter will reduce the wgrad kernels as seen in this example.

2 Likes