How to control one data update part model and another data update the whole model?

chener · August 24, 2022, 6:45am

Hi, I have a specific question. In each training step, I have two pairs of input, (x1, y1) and (x2,y2), and the model is composed of two parts model_a and model_b. The loss is computed by |y - model_b( model_a( x ) ) | .

The problem is, that I want the loss computed from pair (x1, y1) to update both the parameters of model_a and model_b, however, the computed loss from (x2, y2) to update only model_a’s parameters. Is this possible?

I have tried the following, I used two optimizers optim_a and optim_b, optim_a updates the parameters of both model_a and model_b, while optim_b only updates the parameters of model_a.

optim_a = torch.optim.Adam( model_a.parameters()+model_b.parameters() )
optim_b = torch.optim.Adam( model_a.parameters() )

In the training phase, I compute both losses like,

 loss1 =  |y1 - model_b( model_a( x1 ) ) |  
 loss2 =  |y2 - model_b( model_a( x2) ) | .

Then,

optim_a.zero_grad()
optim_b.zero_grad()
loss1.backward()
optim_a.step()

loss2.backward()
optim_b.step()

However, when conducting loss2.backward, I met “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”, that’s because the parameters of model_a and model_b are changed when conducting loss2.backward().

Any ideas to solve the problem? Nedd Help!!

tom · August 24, 2022, 7:02am

Would it work to do

require gradients for everything
compute loss 1
set requires grad to false for model a
compute loss 2
zero grad, compute backward of total loss, take step

As long as you detach anything from the loss1 computation that you use in computing loss2, it would seem worth a try.

Best regards

Thomas