The probability of a path for a continuous action space is 0

Xiaodong_Jia · January 28, 2018, 12:25am

Hi. When the action space A is continuous, and the policy, for example is a normal distribution, then the probability of a path is 0. I’m curious about how to deal with it when, for example implementing PPO and TRPO, where the probability of a path should be multiplied. There must be a formal explanation for it, but I haven’t found it

Thanks for your help.

keithmgould · February 15, 2018, 9:42pm

The probability of an action from a normal distribution is always 0. Look at probability densities. When working with continuous actions, you can usually use the probability density of an action instead of the probability (which is used for discrete actions)

Xiaodong_Jia · February 15, 2018, 9:57pm

Thanks for your reply
In fact, I’ve checked the implementations, and found the probability density is used instead. I’m not sure if this is a well-known fact, since I haven’t seen a paper mentioned it. Could you give me some formal reference of this fact?

keithmgould · February 15, 2018, 10:01pm

See section 13.7 of Sutton’s Reinforcement Learning Book.