I have 2 models (M1 and M2), with two losses L1 and L2. Not to mention 2 optimizers.
The output from M1 is fed as an input to M2, which makes M1 part of the computational graph of M2 (right?)

Consequently, the parameters of M1 will receive gradient signal from L2 (loss of the second model). I would like to prevent this from happening, i.e. M1 only learns from L1 and M2 only learns from L2. How can I do this?

I am guessing that I need to use a detach() somewhere (like in the DCGAN example), but I really donâ€™t know where exactly. (I had a previous question on how detach() works and I am still not super clear)

Looks correctâ€¦Detach is creating a new Variable which only shares the data with the original variable but not the graph; kind of like Variable(op1.data)

@ruotianluo Sorry, but I had another question in there to ask.
So L1.backward() will give gradients to model1 only and L2.backward() will give gradients to model 2 only.

If I do L1 = L1 + lambda*L2, followed by L1.backward(), in this case L2 is still not going to have an effect on the parameters of model 1 ? or will something change W.R.T the parameters of model 1?