Hi, I’m new in reinforcement learning and I’m recently using A3C for continuous control.
My action i.e the output of the actor network is a vector with size(64*1) and these 64 values are used as the probabilities, so all of them should be in the scale of (0,1). I use the normal distribution with the decayed variance to choose the real action and calculate the log-probabilities, but if I do so, the action will be out of range.
I tried to clip the action between 0 and 1, but that would destroy the normal distribution and lead to some bad results. Have you ever had the same problem? Any advice? Thanks!