Sorry, no tutorial-like examples come to mind, as such models are usually big with a lot of model-specific details. But it seems this is a common question in this forum (when people ask about “freezing layers”, “combining models” and such, variants of staged training are implied), see this topic for example.
Simplest implementation would look like (if I understood your intent correctly):
output = red_network(input)
if stage2:
output = combine_tensors(output.detach(), blue_network(input))
filtering optimizer params or changing their requires_grad to False achieves similar effect.
but note that in this form blue_network may be learning red_network errors (residuals), I don’t understand how you want to change your network loss with blue_network addition… If you use output = blue_network(input, red_network(input)), that’s residual network, if your blue_output and red_output must be independently useful, you want “model combining” instead.