What's the right way of implementing policy gradient?

11118 · August 9, 2017, 12:11pm

Emm, here is the full formula
policy is the weight of loss.grad, not the weight of loss itself.