Best sampling function for A3C prob

Hi,

I am confused in which is the best probability distribution sampling functions is best for training an A3C reinforcement learning model.

May I get some advise of experience holders.

Thanks,
Granth

I guess you are asking for sampling functions in the contiguous domain and not discrete domain?

Hi,

I am trying for discrete domain.

Actually I have tried multinomial and categorical …but both gets stuck if they want to avoid negative rewads and try an action which does nothing.

Can you please helps to let me know any sampling function that tries the low probability action as well.

Also is it ok to sample a random action while training A3C.

Thanks,
Granth

The problem is that your network has converged and keep on outputing “invalid” actions, setting their probability to be high and “valid” ones to be low. You should check your implementation rather than blaming the distribution itself.

A2C, A3C… these policy based methods relies on sampling from a distribution to calculate the needed log probability, it is not only “ok” but also “must”.