Best practices for copying Variable data

Mehdi · March 20, 2018, 2:19pm

Hello,

I’m working on a PPO implementation. It is a RL algorithm in which we constraint the policy to be updated in the neighborhood of the previous policy.

To do so, the algorithm relies on a ratio that evaluates the differences of probabilities between old and new policy.
So, in my implementation I:

Compute the log prob coming from the old policy
Compute the log prob coming from the new policy
Evaluate the ratio
Clone the new policy into the old one
Update the new policy

I realize that my ratio is always absolutely equal to one. Could it be because I copy the network using this method:

 for p_source, p_target in zip(self.parameters(), clone.parameters()): 
     p_target.data = p_source.data

Thanks !

albanD · March 20, 2018, 2:22pm

Hi,

Why not just do: self.load_state_dict(clone.state_dict()) ?

Mehdi · March 20, 2018, 2:51pm

Definitely. I think what happened is that using the old method, both the parameters ended up on the same adress.

Thanks !

jpeg729 · March 20, 2018, 5:07pm

If you do p_target.data = p_source.data then the data doesn’t get copied. Python just copies the memory address of the tensor so that p_target.data and p_source.data reference the same storage location.