Hello,

I’m working on a PPO implementation. It is a RL algorithm in which we constraint the policy to be updated in the neighborhood of the previous policy.

To do so, the algorithm relies on a ratio that evaluates the differences of probabilities between old and new policy.

So, in my implementation I:

- Compute the log prob coming from the old policy
- Compute the log prob coming from the new policy
- Evaluate the ratio
- Clone the new policy into the old one
- Update the new policy

I realize that my ratio is always absolutely equal to one. Could it be because I copy the network using this method:

```
for p_source, p_target in zip(self.parameters(), clone.parameters()):
p_target.data = p_source.data
```

Thanks !