Correct way of freezing layers

Hello,

I have a model M and I am cloning it M.clone()
Now, I want to freeze certain layers of M.clone(). When I set requires_grad=False, it results in this error:

RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

How to freeze the layers of M.clone() in that case? I want to ensure that when I backpropagate using the loss computed on a batch using M.clone(), I compute the gradients of M

A small script:

model = ResNet()
optimizer = Adam(model.parameters())
cloned_model = model.clone() # .clone() is a custom method that creates a copy of the model
for p in cloned_model.features.parameters():
     p.require_grad = False
error = loss(cloned_model(data), labels)
error.backward()
optimizer.step()

Thank you!

1 Like

@asura
Some small changes

model = ResNet()
optimizer = Adam(model.parameters())
cloned_model = model.clone()
for p in cloned_model.parameters():
     p.requires_grad = False
error = loss(cloned_model(data), labels)
error.backward()
optimizer.step()
1 Like

Thanks @anantguptadbl ! However, by adding cloned_model.features, I want to specify that I don’t want to freeze all layers. Regarding my above question, can you suggest anything?

@asura

That can also be done

  • Resnet has a lot of layers. If you want to freeze specific internal resnet layers, then you will have to do it manually
    e.g.
for p in cloned_model.layer1.parameters():
     p.requires_grad = False

Thanks @anantguptadbl, Resnet() is just an example model. You can consider any model (say a CNN with 4 layers). My question lies here itself. Doing cloned_model.layer1.parameters() results in the RuntimeError I mentioned above.

@ptrblck thanks for your reply in the other thread.
I had one more issue related to the loss problem. Can you please give some insights on this? Once again, thanks a lot!

.clone() is a tensor method and is undefined for nn.Modules so your code should already fail in model.clone() unless it’s a custom method which isn’t posted here.

My bad, it is indeed a custom method that creates a copy of the module whose parameters are created using the torch.clone() method.

Depending how this custom method is implemented, you might want to call detach() on the parameters additionally or just use copy.deepcopy(model).

Thank you @ptrblck, however, if I use either of the methods, will I be able to compute gradients on the main model using error.backward()?

No, since you want to freeze the cloned model. If you want to keep the gradient history intact, don’t freeze the parameters of the cloned model:

weight = nn.Parameter(torch.randn(2, 2))
weight_cloned = weight.clone()

y = weight_cloned * 2
y.mean().backward()
print(weight.grad)
> tensor([[0.5000, 0.5000],
          [0.5000, 0.5000]])
1 Like

Got it. Thanks a lot for your help!