Multi-agent RL with different agent action spaces

acoursey · June 12, 2025, 8:04pm

I have been following the tutorial on multi-agent PPO from TorchRL: Multi-Agent Reinforcement Learning (PPO) with TorchRL Tutorial — torchrl 0.8 documentation. However, I am using the MPE2 speaker listener environment where both agents have different observation and action spaces.

Does TorchRL support this setting?

The MultiAgentMLP requires an int for the n_agent_outputs, not a list of ints. To get around this, I implemented my own network that supports different input and output shapes for each agent. However, when executing the ProbabilisticActor, the distribution only samples a single tensor, not one for each agent.

I would appreciate any advice on this topic!