How to control one data update part model and another data update the whole model?

Hi, I have a specific question. In each training step, I have two pairs of input, (x1, y1) and (x2,y2), and the model is composed of two parts model_a and model_b. The loss is computed by |y - model_b( model_a( x ) ) | .

The problem is, that I want the loss computed from pair (x1, y1) to update both the parameters of model_a and model_b, however, the computed loss from (x2, y2) to update only model_a’s parameters. Is this possible?

I have tried the following, I used two optimizers optim_a and optim_b, optim_a updates the parameters of both model_a and model_b, while optim_b only updates the parameters of model_a.

optim_a = torch.optim.Adam( model_a.parameters()+model_b.parameters() )
optim_b = torch.optim.Adam( model_a.parameters() )

In the training phase, I compute both losses like,

 loss1 =  |y1 - model_b( model_a( x1 ) ) |  
 loss2 =  |y2 - model_b( model_a( x2) ) | . 

Then,

optim_a.zero_grad()
optim_b.zero_grad()
loss1.backward()
optim_a.step()

loss2.backward()
optim_b.step()

However, when conducting loss2.backward, I met “RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation”, that’s because the parameters of model_a and model_b are changed when conducting loss2.backward().

Any ideas to solve the problem? Nedd Help!!

Would it work to do

  • require gradients for everything
  • compute loss 1
  • set requires grad to false for model a
  • compute loss 2
  • zero grad, compute backward of total loss, take step

As long as you detach anything from the loss1 computation that you use in computing loss2, it would seem worth a try.

Best regards

Thomas

1 Like