To require_grad or to not require_grad?


Say I have a 6 layer network, that has been trained on some data. Now I want to re-train the lowest layer (layer closest to the data) only. I know that I can add only the specific layer (say self.fc1) to the optimizer, so that only its parameters get updated.
However, I am a little confused if I need to set requires_grad=False for the other layers. Do I need to? what happens if I do or do not?

Thanks in advance,


It does not matter, but it’s useless to compute the grad for those layers indeed.