Way to freeze layers

sneer · August 1, 2022, 8:44am

Hi,
I want to freeze (some) layers of a network feature encoder (resnet50 in my case) and then add some dense layer to the feature encoder to evaluate on some classification task.

I am freezing the layers like this:

for child in self.feature_extractor.children():
    for param in child.parameters():
        param.requires_grad = False

First questions: Is this the correct way? I saw some people use named_children() instead of children(). Also some people call parameters in the first for loop and leave out the second one. Are there any difference between those methods?

I also saw some people to only pass layers that have requires_grad=True to the optimizer.
optimizer = optim.SGD(filter(lambda p: p.requires_grad, self.parameters()), lr=0.1)
Second question: Is this required after using the above method?

To add the dense layer I make a new layer with self.classifier=nn.Linear(...) and use it in the forward() function like:

representations = self.feature_extractor(x).flatten(1)
return self.classifier(representations)

using the optimizer like this (passing self.parameters() as params argument:
optimizer = optim.SGD(params=self.parameters()), lr=0.1)
should work fine, right? And should calculate gradients for the classifier (and the feature extractor in case I don’t want to freeze it)

Thanks a lot

ptrblck · August 2, 2022, 3:55am

The latter will return the name as well as the child module which could be useful to e.g. filter out child modules if needed.

.parameters() will return all parameters of the module and all internally registered submodules (children). Your approach would work, but you could also directly use self.feature.extractor.parameters().

It’s not required, as the optimizer will ignore parameters, which do not require gradients. Depending on your use case you might want to set the requires_grad attribute to True after a while at which point the optimizer would then start optimizing these parameters. If that’s not your use case, you could just filter out the parameters from the beginning to make sure they are never optimized.

sneer · August 2, 2022, 8:51am

Perfect,
thanks you very much