Why we use Categorical

Please explain why we use Categorical? I can’t figure it out. What do we get at the categorical output?

def predict(state):
    # Select an action (0 or 1) by running policy model
    # and choosing based on the probabilities in state
    state = torch.from_numpy(state).type(torch.FloatTensor)
    action_probs = policy(state)
    distribution = Categorical(action_probs)
    action = distribution.sample2()
1 Like
distribution = Categorical(action_probs)
    action = distribution.sample2()

As I have understod it Categorical returns the propabillety of every possible action. Then one possible action gets sampled from this distribution (You can’t just do argmax(action_probs) because in some sence then there would be no exploration also the intension is to increase the propabillety of good actions and decrease the propabillety of bad actions that is why whe have to sample from the distripution instead of chosing the highest value)

I am fairly new to reinforcement learning and python so pleas correct me if I am wrong.

Why we use Categorical (or any other distribution) is a direct result of the policy formulation, where action is defined to be sampled from a distribution, a ~ π(s), with probability of action = π(a|s). It’s an implementation of the algorithm. You may change the distribution (e.g. to Gaussian for continuous actions), but you may not change it to an argmax because that is fundamentally a different algorithm/sampling mechanism (argmax is not a distribution).

Policy-based methods in RL like ActorCritic are different from Q-learning that uses a greedy-selection of action based on it’s Q-estimation (hence the argmax in Q-learning).