Best practices for copying Variable data

Hello,

I’m working on a PPO implementation. It is a RL algorithm in which we constraint the policy to be updated in the neighborhood of the previous policy.

To do so, the algorithm relies on a ratio that evaluates the differences of probabilities between old and new policy.
So, in my implementation I:

  1. Compute the log prob coming from the old policy
  2. Compute the log prob coming from the new policy
  3. Evaluate the ratio
  4. Clone the new policy into the old one
  5. Update the new policy

I realize that my ratio is always absolutely equal to one. Could it be because I copy the network using this method:

 for p_source, p_target in zip(self.parameters(), clone.parameters()): 
     p_target.data = p_source.data

Thanks !

Hi,

Why not just do: self.load_state_dict(clone.state_dict()) ?

1 Like

Definitely. I think what happened is that using the old method, both the parameters ended up on the same adress.

Thanks !

If you do p_target.data = p_source.data then the data doesn’t get copied. Python just copies the memory address of the tensor so that p_target.data and p_source.data reference the same storage location.