I have this model depicted in the figure. Model 1 and model 2 used to be two disjoint models such that they worked in a pipeline that we first train model 1 till convergence and feed the preprocessed outputs to model 2 as inputs. I am now training them end to end and I am struggling with how to integrate these two losses instead of just using `loss 2`

. Other quesiton is, can I use two different optimizers for each of their parameters.

Should I do `loss1.backward()`

first to update the gradients for the first model and then do `loss2.backward()`

which will update the gradients for both model 1 and model 2 parameters. Do you think this is a good idea where gradients can be updated from both losses with a controlled learning rate so that I can force model 1 to learn more from `loss1`

than from `loss2`

?

Another idea that came to my mind is to sum both `loss1`

and `loss2`

(let’s call it `loss3`

) and backpropagate. I have the initial idea that `loss3`

will only backpropagate `loss2`

until it reaches `c`

and then backpropagated the weighted sum. Is that right?

Any ideas or references to the literature will be appreciated.