This is a basic question about how pytorch computation graphs work in a slightly nontrivial case. The specific code/algorithm following does not matter, but I don’t have better example on hand to illustrate my question so bear with me:
My observation comes from @kostrikovs PPO source code, in lines 197-211 in main.py. I’ve made some changes for the sake of simplicity to arrive at the following pseudo code, that illustrates the computation graph mechanics I’m unsure of:
output1 = model(Variable(batch))
output2 = old_model(Variable(batch))
ratio = torch.exp(output1 - Variable(output2.data))
optimizer.zero_grad()
ratio.backward()
optimizer.step() # params = model.parameters()
My observation is that the above code works correctly, but this very similar code does NOT:
batch_var = Variable(batch)
output1 = model(batch_var)
output2 = old_model(batch_var)
ratio = torch.exp(output1 - Variable(output2.data))
optimizer.zero_grad()
ratio.backward()
optimizer.step() # params = model.parameters()
I don’t understand why these routines produce different results. As far as I understand the line “Variable(output2.data))” in the definition of “ratio” should prevent backward() from propagating through the original “output2” computation graph and changing gradients that way. I also don’t see why simply sharing an input Variable between the two models should change the result of backward() (the forward() of “model/old_model” has no in place operations).
I assume something is missing from my understanding. Any insights?