Hello,

I have 2 models, with two separate loss functions (loss1 and loss2) and optimizers. The output of one model (a linear layer) is used by the other as an input. Based on some previous questions asked here, I was able to get this model training by repackaging the output of the first model as a Variable :

```
op1 = model1(data1)
op2 = Variable(op1.data1)
op3 = model2(data2,op2)
loss1 = criterion1(op1,target1)
loss1.backward()
optimizer1.step()
loss2 = criterion2(op2,target2)
loss2.backward()
optimizer2.step()
```

Q. This seems to work. But how do I know if the each optimizer is taking the correct set of gradients? I assume that it is, as each optimizer is associated with a specific set of parameters (none shared), so the number of gradients needs to be consistent.

Q. I was now trying to use the loss2 to guide the optimization of model1, Does this make sense??

```
loss1 = criterion1(op1,target1)
loss1.backward()
loss2 = criterion2(op2,target2)
loss2 = loss2 + lambda*loss1
loss2.backward()
```

This gives me an error, and tells me to use `retain_variables=True`

in my first `backward()`

call. When I do this the model does start training.

Q. What is `retain_variables`

doing? Is this the correct way to do what I want (i.e. guide the parameter updates of one model using the loss of the other). Or does this just not make sense?

Thanks,

Gautam