Training multiple models together in coordinate block descent fashion

I am trying to train multiple models together in a block descent fashion aka calculating losses using all of the models but backpropagating the aggregated losses on only one of them at a time for updating the parameters. For the same, I have set up two forward passes. The outputs of the first forward pass are detached and only the model under consideration is kept with the computational graph in the second forward pass. Losses are being aggregated and backpropagated via this model.

I have the following questions:

  1. Are there more efficient ways of setting this up?