Correct way of freezing layers

asura · February 4, 2022, 3:42am

Hello,

I have a model M and I am cloning it M.clone()
Now, I want to freeze certain layers of M.clone(). When I set requires_grad=False, it results in this error:

RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().

How to freeze the layers of M.clone() in that case? I want to ensure that when I backpropagate using the loss computed on a batch using M.clone(), I compute the gradients of M

A small script:

model = ResNet()
optimizer = Adam(model.parameters())
cloned_model = model.clone() # .clone() is a custom method that creates a copy of the model
for p in cloned_model.features.parameters():
     p.require_grad = False
error = loss(cloned_model(data), labels)
error.backward()
optimizer.step()

Thank you!

anantguptadbl · February 4, 2022, 4:24am

@asura
Some small changes

model = ResNet()
optimizer = Adam(model.parameters())
cloned_model = model.clone()
for p in cloned_model.parameters():
     p.requires_grad = False
error = loss(cloned_model(data), labels)
error.backward()
optimizer.step()

asura · February 4, 2022, 4:28am

Thanks @anantguptadbl ! However, by adding cloned_model.features, I want to specify that I don’t want to freeze all layers. Regarding my above question, can you suggest anything?

anantguptadbl · February 4, 2022, 4:48am

@asura

That can also be done

Resnet has a lot of layers. If you want to freeze specific internal resnet layers, then you will have to do it manually
e.g.

for p in cloned_model.layer1.parameters():
     p.requires_grad = False

If you want the entire resent to be frozen and only allow the linear layer after resnet
neural network - how to freeze some layers when fine tune resnet50 - Stack Overflow

asura · February 4, 2022, 4:52am

Thanks @anantguptadbl, Resnet() is just an example model. You can consider any model (say a CNN with 4 layers). My question lies here itself. Doing cloned_model.layer1.parameters() results in the RuntimeError I mentioned above.

asura · February 4, 2022, 8:07am

@ptrblck thanks for your reply in the other thread.
I had one more issue related to the loss problem. Can you please give some insights on this? Once again, thanks a lot!

ptrblck · February 4, 2022, 8:27am

.clone() is a tensor method and is undefined for nn.Modules so your code should already fail in model.clone() unless it’s a custom method which isn’t posted here.

asura · February 4, 2022, 8:29am

My bad, it is indeed a custom method that creates a copy of the module whose parameters are created using the torch.clone() method.

ptrblck · February 4, 2022, 9:05am

Depending how this custom method is implemented, you might want to call detach() on the parameters additionally or just use copy.deepcopy(model).

asura · February 4, 2022, 9:12am

Thank you @ptrblck, however, if I use either of the methods, will I be able to compute gradients on the main model using error.backward()?

ptrblck · February 4, 2022, 9:23am

No, since you want to freeze the cloned model. If you want to keep the gradient history intact, don’t freeze the parameters of the cloned model:

weight = nn.Parameter(torch.randn(2, 2))
weight_cloned = weight.clone()

y = weight_cloned * 2
y.mean().backward()
print(weight.grad)
> tensor([[0.5000, 0.5000],
          [0.5000, 0.5000]])

asura · February 4, 2022, 9:46am

Got it. Thanks a lot for your help!