Fine tune the last layer while fix other layers

Hi, I have some problems with fine-tuning the last layer of a Neural network.
Code is as followed:

for layer in self.layers[0:-1]: 
        for param in self.model.fc_layers[layer].parameters():
            param.requires_grad = False
        
        y_prime = self.model(X)
        loss = self.model.criterion(y_prime, y)
        
        self.model.optimize.zero_grad()
        self.model.zero_grad()
        
        loss.backward(retain_graph=1)
        self.model.optimize.step()

Model is a Neural network.
It didn’t work as I fine-tune the last layer(re-train), I get the same result while I didn’t fix the parameters of any layers.

something is wrong here, but I can’t figure it out.

Could you check, if the parameters of the frozen layers get valid gradients after the backward call via:

loss.backward()
print(self.model.frozen_layer.weight.grad)
print(self.model.last_layer.weight.grad)

If the frozen layers yield None, while the last linear layer yields a valid gradient, your code is working as expected.

hi, thx.
It has the error: the model has no ‘‘frozen_layer’’ module.
So I change it to

print(self.model.fc_layers[-2].weight.grad)
print(self.model.last_layer.weight.grad)

Since the second last layer is frozen as I wrote the code in question.
But the output is not “None”, is the gradient matrix.
So it is not working as expected I think.

Yes, the frozen_layer and last_layer are just placeholder names for your layer names.

Did you train the “frozen” layers before and forgot to zero out the gradients?
If not, could you post a code snippet which represents your training routine and how and where you are freezing the layers?

Your initial code snippet looks like it has some indentation issues, as your “layer loop” would be used for each training iteration.
Is this a copy-paste error or are you using this code in your script as shown here?

oh, sorry, yes, it’s a copy-paste mistake.

I train the frozen layer before, the function is as followed:

def step_mlp(self,X,y):
        y_prime = self(X)
        loss = self.criterion(y_prime, y)
        
        self.optimize.zero_grad()
        self.zero_grad()
        
        loss.backward(retain_graph=1)
        self.optimize.step()

Then re-train the model by frozen the layers except the last layer as the code in question decription.
Frozen layer:

for layer in self.layers[0:-1]: 
        for param in self.model.fc_layers[layer].parameters():
            param.requires_grad = False

Then re-train:

y_prime = self.model(X)
loss = self.model.criterion(y_prime, y)
        
self.model.optimize.zero_grad()
self.model.zero_grad()
loss.backward(retain_graph=1)
self.model.optimize.step()

Your general workflow should work as shown in this small example:

# Setup
model = models.resnet18()
data = torch.randn(1, 3, 224, 224)
target = torch.randint(0, 1000, (1,))
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)

# Train
optimizer.zero_grad()
out = model(data)
loss = criterion(out, target)
loss.backward()
optimizer.step()

# Freeze all but last layer
for name, param in model.named_parameters():
    if not 'fc' in name:
        param.requires_grad = False
      
optimizer.zero_grad()
out = model(data)
loss = criterion(out, target)
loss.backward()

# Check grads
grad_frozen = model.conv1.weight.grad
grad_fc = model.fc.weight.grad
print(grad_frozen.abs().sum())
print(grad_fc.abs().sum())

optimizer.step()

As you can see, after freezing all parameters but the fc.weight and fc.bias, I get a zero gradient for a previous layer, while I get valid gradients for the last linear layer.

Which layers are you checking and are you sure the checked frozen layer is in model.fc_layers?

2 Likes

Thanks for your detailed example, ptrblck!

Find the error. It turns out that I didn’t have any frozen layers in model.fc_layers because I make a mistake in my code.
Your reply helps me to find it. Thanks a lot!

1 Like