This is a basic question about how pytorch computation graphs work in a slightly nontrivial case. The specific code/algorithm following does not matter, but I don’t have better example on hand to illustrate my question so bear with me:

My observation comes from @kostrikovs PPO source code, in lines 197-211 in main.py. I’ve made some changes for the sake of simplicity to arrive at the following pseudo code, that illustrates the computation graph mechanics I’m unsure of:

```
output1 = model(Variable(batch))
output2 = old_model(Variable(batch))
ratio = torch.exp(output1 - Variable(output2.data))
optimizer.zero_grad()
ratio.backward()
optimizer.step() # params = model.parameters()
```

My observation is that the above code works correctly, but this very similar code does NOT:

```
batch_var = Variable(batch)
output1 = model(batch_var)
output2 = old_model(batch_var)
ratio = torch.exp(output1 - Variable(output2.data))
optimizer.zero_grad()
ratio.backward()
optimizer.step() # params = model.parameters()
```

I don’t understand why these routines produce different results. As far as I understand the line “Variable(output2.data))” in the definition of “ratio” should prevent backward() from propagating through the original “output2” computation graph and changing gradients that way. I also don’t see why simply sharing an input Variable between the two models should change the result of backward() (the forward() of “model/old_model” has no in place operations).

I assume something is missing from my understanding. Any insights?