Hello there,

Please,How can we apply Reinforce or any Policy gradient algorithm when the actions space is multidimensional, let’s say that for each state the action is a vector a= [a_1,a_2,a_3] where a_i are discrete ?

In this case the output of the policy network must model a joint probability distribution

Thanks

You need the reparameterization trick:

if each dimension of your action vector is of the same meaning (same number and meaning of choices), consider about `torch.multinomial`

if not and each dimension are independent from each other, then create multiple outputs in your network and direct them to multiple categorical distribution, and draw each dimension of your sample from these distributions.

If they are correlated, then you need to build your own model of distribution.

Another option is to map all combinations: a_1 x a_2 x a_3 to a single categorical distribution.

Thanks for your reply ,

Yes, the action vector components are all of the same meaning :

Let’s say that the output of my policy network is a 2D tensor of shape (m,n) (sum of each row equals one)

and I want to generate according to this probability matrix an action vector of size m a = [a_1,…,a_m]

In this case the multinomial distributions is the suitable one ?

yes，you should use the multinomial distribution.