Hi. When the action space A is continuous, and the policy, for example is a normal distribution, then the probability of a path is 0. I’m curious about how to deal with it when, for example implementing PPO and TRPO, where the probability of a path should be multiplied. There must be a formal explanation for it, but I haven’t found it
Thanks for your help.
The probability of an action from a normal distribution is always 0. Look at probability densities. When working with continuous actions, you can usually use the probability density of an action instead of the probability (which is used for discrete actions)
Thanks for your reply
In fact, I’ve checked the implementations, and found the probability density is used instead. I’m not sure if this is a well-known fact, since I haven’t seen a paper mentioned it. Could you give me some formal reference of this fact?
See section 13.7 of Sutton’s Reinforcement Learning Book.