Best sampling function for A3C prob

granth_jain · September 2, 2020, 7:46pm

Hi,

I am confused in which is the best probability distribution sampling functions is best for training an A3C reinforcement learning model.

May I get some advise of experience holders.

Thanks,
Granth

iffiX · September 3, 2020, 1:58am

I guess you are asking for sampling functions in the contiguous domain and not discrete domain?

granth_jain · September 3, 2020, 6:02pm

Hi,

I am trying for discrete domain.

Actually I have tried multinomial and categorical …but both gets stuck if they want to avoid negative rewads and try an action which does nothing.

Can you please helps to let me know any sampling function that tries the low probability action as well.

Also is it ok to sample a random action while training A3C.

Thanks,
Granth

iffiX · September 5, 2020, 6:30am

The problem is that your network has converged and keep on outputing “invalid” actions, setting their probability to be high and “valid” ones to be low. You should check your implementation rather than blaming the distribution itself.

A2C, A3C… these policy based methods relies on sampling from a distribution to calculate the needed log probability, it is not only “ok” but also “must”.