Help with computational graph / Where to detach()


I have 2 models (M1 and M2), with two losses L1 and L2. Not to mention 2 optimizers.
The output from M1 is fed as an input to M2, which makes M1 part of the computational graph of M2 (right?)

Consequently, the parameters of M1 will receive gradient signal from L2 (loss of the second model). I would like to prevent this from happening, i.e. M1 only learns from L1 and M2 only learns from L2. How can I do this?

I am guessing that I need to use a detach() somewhere (like in the DCGAN example), but I really don’t know where exactly. (I had a previous question on how detach() works and I am still not super clear)

Any help would be greatly appreciated.



Would this do what I want?

op1 = model1(data1)
op2 = model2(data1,op1.detach())


Looks correct…Detach is creating a new Variable which only shares the data with the original variable but not the graph; kind of like Variable(


@ruotianluo Thanks for the reply!

Do you mean if I did:
op1 = model1(data1)
op2 = Variable(
op3 = model2(data1,op2)

It would have the same effect?
I was doing this before but I thought that this was sharing the graph along with the data.

Thanks again

1 Like

@ruotianluo Sorry, but I had another question in there to ask.
So L1.backward() will give gradients to model1 only and L2.backward() will give gradients to model 2 only.

If I do L1 = L1 + lambda*L2, followed by L1.backward(), in this case L2 is still not going to have an effect on the parameters of model 1 ? or will something change W.R.T the parameters of model 1?

In this case, L2 will still not give gradients to model 1.

@ruotianluo Thank you for the clarification