Right now your model outputs a softmax that represents a categorical distribution. Instead of doing that, have your model output the mean and standard deviation of a Gaussian that you can then sample from to choose your action.
2 Likes