A question about the reinforcement learning example

xwgeng · May 12, 2017, 3:25pm

hi, guys

Below is REINFORCE update rule.

Instead of log probability, the reinforcement learning example uses the probability to compute the gradient. Is it correct ?

jekbradbury · May 13, 2017, 12:22am

The gradient of the log of the probability (in the case of the multinomial distribution that’s used in the example) is negative the reciprocal of the probability, which is what you’ll find in the multinomial function’s backward definition.

xwgeng · May 13, 2017, 3:04am

Thanks for your reply.