What is the typical implementation when you train an additional optimizer for an additional layer?

JimW · January 22, 2023, 1:17am

Hi, I had a network called model() and it generated predictions like this:

pred1= model(input1)
pred2= model(input2)

all_preds = torch.concat([pred1, pred2]).flatten()
fc_new = nn.linear(256, 10)
opt_new = torch.optim.Adam(fc_new.parameters(), lr=0.1)
loss_new = my_loss_func(all_preds, labels)
loss_new.backward()
opt_new.step()

But I got the following error:
RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.

Basically, what I want to do is, freeze “model” but train “fc_new”. But “fc_new” needs to consume the output of “model” as input.

Can someone provide me some instructions on how to fix this problem? Thanks.

J_Johnson · January 22, 2023, 2:57pm

Try this:

optimizer = torch.optim.Adam(list(model.parameters())+list(fc_new.parameters()), lr = 0.1)

KFrank · January 22, 2023, 6:44pm

Hi Jim!

JimW:

pred1= model(input1)
pred2= model(input2)

all_preds = torch.concat([pred1, pred2]).flatten()
fc_new = nn.linear(256, 10)
opt_new = torch.optim.Adam(fc_new.parameters(), lr=0.1)
loss_new = my_loss_func(all_preds, labels)
loss_new.backward()
opt_new.step()
Basically, what I want to do is, freeze “model” but train “fc_new”. But “fc_new” needs to consume the output of “model” as input.

Note, in the code you posted, you never actually use fc_new – you don’t
have fc_new consuming all_preds.

Based on what I think you want, you could do this:

fc_new = nn.linear(256, 10)
opt_new = torch.optim.Adam(fc_new.parameters(), lr=0.1)

with torch.no_grad:   # won't construct computation graph and pred1 and pred2 won't carry requires_grad = True
    pred1= model(input1)
    pred2= model(input2)

all_preds = torch.concat([pred1, pred2]).flatten()

new_preds = fc_new (all_preds)   # which I think you want

loss_new = my_loss_func (new_preds, labels)
loss_new.backward()
opt_new.step()

This approach to “freezing” model is possible because model comes before
fc_new – that is, you are not feeding the output of fc_new into the part you
want frozen – and because you have easy access to model and fc_new
separately (rather than them both being part of some larger model that you
would have to dig down into).

Best.

K. Frank