Loss in reinforcement learning(policy gradient)

VladislavPrh · June 5, 2017, 9:59am

Hello!
I have implemented policy gradient algorithm with such loss:
loss = -torch.mean(log_prob*discounted_rewards)
where log_prob is tensor with actions probabilities which were taken and discounted_rewards is tensor with corresponding discounted rewards for each action.
Is it correct implementation of loss in policy gradient algorithm? Can I use this approach instead of action.reinforce(r)?

Thanks!

ruotianluo · June 6, 2017, 12:06am

This should work…

Silenthinker · October 15, 2017, 9:11pm

Hello,

I also implemented the policy gradient algorithm by minimizing the above loss. However, I observe a strange behavior that while the loss decreases, the reward also decreases quickly and log_prob increases. After spending whole day, I still have no idea. What is possible reason for that?

Thanks!