The probability of a path for a continuous action space is 0

(Edward) #1

Hi. When the action space A is continuous, and the policy, for example is a normal distribution, then the probability of a path is 0. I’m curious about how to deal with it when, for example implementing PPO and TRPO, where the probability of a path should be multiplied. There must be a formal explanation for it, but I haven’t found it :frowning:

Thanks for your help.

(Keith Gould) #2

The probability of an action from a normal distribution is always 0. Look at probability densities. When working with continuous actions, you can usually use the probability density of an action instead of the probability (which is used for discrete actions)

(Edward) #3

Thanks for your reply :slight_smile:
In fact, I’ve checked the implementations, and found the probability density is used instead. I’m not sure if this is a well-known fact, since I haven’t seen a paper mentioned it. Could you give me some formal reference of this fact?

(Keith Gould) #4

See section 13.7 of Sutton’s Reinforcement Learning Book.