To require_grad or to not require_grad?


Say I have a 6 layer network, that has been trained on some data. Now I want to re-train the lowest layer (layer closest to the data) only. I know that I can add only the specific layer (say self.fc1) to the optimizer, so that only its parameters get updated.
However, I am a little confused if I need to set requires_grad=False for the other layers. Do I need to? what happens if I do or do not?

Thanks in advance,


It does not matter, but it’s useless to compute the grad for those layers indeed.

Say you have a network composed of two parts: a feature extractor and a classifier. If I want to train only the feature extractor and not the classifier I should pass the feature extractor parameters to the optimizer and set require_grad to True for every parameter in the classifier right? Can I set require_grad to False in this scenario for every parameter in the classifier?

I have no knowledge of the internals whatsoever, but from my simple testing it seems as though it doesn’t matter, i.e. you can set require_grad to false, but it doesn’t give you any performance speed-up.