I am trying to fine-tune a pre-trained model by freezing different parts of the model. Say the model has an encoder and decoder. If I only want to train the encoder, I can freeze the decoder by settings its Tensor to be requires_grad=False. However, does it block the gradient flow back to the encoder?
In practice, I found the autograd still computes the gradient for the encoder even the grad for decoder is 0. I am just curious what requires_grad
means for Tensor. Is it blocking all gradients or just discarding the records of gradient, even once being computed?