Hi. When the action space A is continuous, and the policy, for example is a normal distribution, then the probability of a path is 0. I’m curious about how to deal with it when, for example implementing PPO and TRPO, where the probability of a path should be multiplied. There must be a formal explanation for it, but I haven’t found it
Thanks for your help.