Multi agent deep reinforcement learning to an environment with discrete action space

Hi, I have been doing the udacity deep-reinforcement-learning nanodegree and I came out with a doubt. Do you know or have heard about any cutting edge deep reinforcement-learning algorithm which can be successfully applied for discrete action-spaces in multi-agent settings?

I have been researching and I have found MADDPG and Soft Q-learning algorithms as the top ones in the state-of-the-art. I implemented the first one over an Unity environment and works well! However, they are mainly focused on environments with continuous action space. Although they can be applied to discrete action-space (e.g. MADDPG with gumbel softmax) it seems it is not what they are intended for (I have tried with MADDPG (w/ Gumbel softmax) achieving disastrous results…). In their corresponding papers they don’t give a lot of details of how to use them in these settings.

Can somebody help me with this?

there’s quite a bit if you do a regular google search. here’s a link.

Concerning the soft-Q learning approach, the adaptation to discret worlds looks simple:

in the critic update, use
Q(a,s) = r(a,s) + sum_s’ ( T(s’|a,s) * V(s’) )
V(s) = log( sum_a exp( Q(a,s) / alpha )

and directly compute the new policy
pi(a|s) = softmax( Q / alpha ) (a,s)
directly for all agents.

Q learning was originally developed for markov decision processes with discrete action spaces.A fine example:

But this is not multi-agent…

same here, this is not multi agent

The paper you mention about multi-agent soft-Q learning is a centralized approach, where each agent are sharing a common critic, with a joint policy (one network giving as output one action per agent). My answer focused on that case.