Hi, I have been doing the udacity deep-reinforcement-learning nanodegree and I came out with a doubt. Do you know or have heard about any cutting edge deep reinforcement-learning algorithm which can be successfully applied for discrete action-spaces in multi-agent settings?
I have been researching and I have found MADDPG and Soft Q-learning algorithms as the top ones in the state-of-the-art. I implemented the first one over an Unity environment and works well! However, they are mainly focused on environments with continuous action space. Although they can be applied to discrete action-space (e.g. MADDPG with gumbel softmax) it seems it is not what they are intended for (I have tried with MADDPG (w/ Gumbel softmax) achieving disastrous results…). In their corresponding papers they don’t give a lot of details of how to use them in these settings.
The paper you mention about multi-agent soft-Q learning is a centralized approach, where each agent are sharing a common critic, with a joint policy (one network giving as output one action per agent). My answer focused on that case.