How to optimize two models separately

ptnewbie · July 18, 2020, 7:58am

I have two models defined as:

model1 = Net1()
model2 = Net2()
opt1 = Adam(model1.parameters(),…)
opt2 = Adam(model2.parameters(),…)

When training, the model2 uses the output of model1 as input, and the two models have a different loss. I tried to train them:

output1 = model1(data)
output2 = model2(output1)
loss1 = Loss(output1)
loss2 = Loss(output2, labels)
loss = loss1+loss2

model1.zero_grad()
model2.zero_grad()
loss.backward()
opt1.step()
opt2.step()

In this way, I cannot train the model1 well. How can I train them separately? I only need loss1 to train model1. So I also tried:

model1.zero_grad()
loss1.backward()
opt1.step()
model1.eval()

model2.zero_grad()
loss2.backward()
opt2.step()
model1.train()

But I still cannot get the same results when only training model1, i.e., I removed all other codes related to model2.
Thanks in advance.

ptrblck · July 18, 2020, 10:31am

The code looks generally alright.
If you don’t want loss2.backward() to calculate gradients in model1, you could detach output1 before passing it to model2 via:

output2 = model2(output1.detach())

This will make sure to cut the computation graph at this point.

ptnewbie · July 18, 2020, 11:30am

Yes, my code is correct. I changed the hyper parameters and it works as training a single network. Thank you for your help!