What's the right way of implementing policy gradient?

I edited my answer above :slight_smile:
Let me know if that helps!

1 Like