[resolved] Actor Critic with a large amount of possible actions

Right now your model outputs a softmax that represents a categorical distribution. Instead of doing that, have your model output the mean and standard deviation of a Gaussian that you can then sample from to choose your action.

2 Likes