Hi,
I want to freeze (some) layers of a network feature encoder (resnet50 in my case) and then add some dense layer to the feature encoder to evaluate on some classification task.
I am freezing the layers like this:
for child in self.feature_extractor.children():
for param in child.parameters():
param.requires_grad = False
First questions: Is this the correct way? I saw some people use named_children()
instead of children()
. Also some people call parameters
in the first for loop and leave out the second one. Are there any difference between those methods?
I also saw some people to only pass layers that have requires_grad=True
to the optimizer.
optimizer = optim.SGD(filter(lambda p: p.requires_grad, self.parameters()), lr=0.1)
Second question: Is this required after using the above method?
To add the dense layer I make a new layer with self.classifier=nn.Linear(...)
and use it in the forward()
function like:
representations = self.feature_extractor(x).flatten(1)
return self.classifier(representations)
using the optimizer like this (passing self.parameters()
as params argument:
optimizer = optim.SGD(params=self.parameters()), lr=0.1)
should work fine, right? And should calculate gradients for the classifier (and the feature extractor in case I don’t want to freeze it)
Thanks a lot