Multidmensional Actions

You need the reparameterization trick:

if each dimension of your action vector is of the same meaning (same number and meaning of choices), consider about torch.multinomial

if not and each dimension are independent from each other, then create multiple outputs in your network and direct them to multiple categorical distribution, and draw each dimension of your sample from these distributions.

If they are correlated, then you need to build your own model of distribution.

Another option is to map all combinations: a_1 x a_2 x a_3 to a single categorical distribution.