Correct way to freeze layers

nabihach · October 7, 2018, 1:37pm

I have some confusion regarding the correct way to freeze layers.
Suppose I have the following NN: layer1, layer2, layer3
I want to freeze the weights of layer2, and only update layer1 and layer3.
Based on other threads, I am aware of the following ways of achieving this goal.

Method 1:

optim = {layer1, layer3}
compute loss
loss.backward()
optim.step()

Method 2:

layer2_requires_grad=False
optim = {all layers with requires_grad = True}
compute loss
loss.backward()
optim.step()

Method 3:

optim = {layer1, layer2, layer3}
layer2_old_weights = layer2.weight (this saves layer2 weights to a variable)
compute loss
loss.backward()
optim.step()
layer2.weight = layer2_old_weights (this sets layer2 weights to old weights)

Method 4:

optim = {layer1, layer2, layer3}
compute loss
loss.backward()
set layer2 gradients to 0
optim.step()

My questions:

Should we get different results for each method?
Is any of these methods wrong?
Is there a preferred method?

kelam_goutam · October 7, 2018, 2:53pm

I would like to do it the following way -

# we want to freeze the fc2 layer this time: only train fc1 and fc3
net.fc2.weight.requires_grad = False
net.fc2.bias.requires_grad = False

# passing only those parameters that explicitly requires grad
optimizer = optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=0.1)

# then do the normal execution of loss calculation and backward propagation

#  unfreezing the fc2 layer for extra tuning if needed
net.fc2.weight.requires_grad = True
net.fc2.bias.requires_grad = True

# add the unfrozen fc2 weight to the current optimizer
optimizer.add_param_group({'params': net.fc2.parameters()})

nabihach · October 7, 2018, 4:06pm

@kelam_goutam I believe your way is the same as Method 2 described above. Can you please explain why you prefer this over others?

kelam_goutam · October 8, 2018, 4:28am

I feel method 3 and 4 are waste of computation. Why to compute gradients for the layers which you dont want to update. I think method 1 would be ideal as in method 2 we need to explicitly mark the layers parameters as False and then its our responsibility to mark them as True if we need to update those layers. However I preferred the Method 2 thinking that using this way it was easier to freeze weights of any layer in case of a huge network as optimizer will automatically gather which all layers to update.

warm regards,
Goutam Kelam

enolerobotti · November 2, 2021, 7:29am

If you can change the contents of forward method of a layer, you can use self.eval() and with torch.no_grad():

if self.frozen_pretrained_weights:
  self.eval()
    with torch.no_grad():
      output = self.encoders(batch)
  else:
    output = self.encoders(batch)