I have been following the tutorial on multi-agent PPO from TorchRL: Multi-Agent Reinforcement Learning (PPO) with TorchRL Tutorial — torchrl 0.8 documentation. However, I am using the MPE2 speaker listener environment where both agents have different observation and action spaces.
Does TorchRL support this setting?
The MultiAgentMLP requires an int for the n_agent_outputs, not a list of ints. To get around this, I implemented my own network that supports different input and output shapes for each agent. However, when executing the ProbabilisticActor, the distribution only samples a single tensor, not one for each agent.
I would appreciate any advice on this topic!