The output tensor of the critic in A2C algorithm contains probability distribution parameters, Those parameters will be used to sample from pd.

Suppose we are on the continuous case, I would like to sample an action from a normal distibution with those parameter to create an action tensor,

- Do I need to have twice the size of the action space at the output of the critic to take into consideration the mean and the std of the normal distributions (suppose diagonal covariance matrix)?
- How can I interpret p(
**a|s**) with a neural network ? I cannot see the difference with p(**s|a**), is**s**or**a**the input of the neural network or both?