My question is regarding training multiple models end-to-end. Consider I have three models that the first model gets input and produce some intermediate representation. The second model modifies the intermediate representation and the first model continues processing the modified representation. Finally, the third model works on the output of the first model.
My question is how to train these networks. There are two options, but I’m not sure which one is practically correct (or if there is a third option).
First option: Each model has its own loss, backward, and optimizer step.
Second option: There is only one loss, which is a (weighted) sum of all models’ losses, one single backward of the whole loss, and optimizer step of all models’ optimizers.
Which one is correct?
Please note, the goal is the output of the first model. The second model is added as guidance, and the third model provides additional feedback.