Creating custom MARL env in torchrl

I am using pytorch/torchrl to train a model for playing a variant of poker. I have implemented the game in an environment following, for example, the example outlines in the pendulum tutorial (Pendulum: Writing your environment and transforms with TorchRL — PyTorch Tutorials 2.3.0+cu121 documentation). I would like to make this a multi-agent environment using the MARL API referred to here: Competitive Multi-Agent Reinforcement Learning (DDPG) with TorchRL Tutorial — torchrl 0.4 documentation

I cannot find any documentation or tutorials explaining how to implement a multi-agent environment or the details behind the API. Are there any resources you suggest to understand how the spec should be laid out?

Looks like an amazing project!
Do you think you can get enough info here to get started?
https://pytorch.org/rl/stable/reference/envs.html#multi-agent-environments
Otherwise happy to write more stuff if you can pinpoint what’s blocking you!

Thank you for the rapid reply!

I did see that page, and could proceed by analogy to fill out the spec. It really only gives a single example output, compared with the Pendulum tutorial which gives a fairly complete walk-through of an implementation I find it less useful, but I can try. I’d also look at some of the code of the existing environments that use the MARL API.

In terms of specific questions, I have two:

  1. As a general principle, is it just a matter of nesting the usual env fields in the specs and output TensorDicts using these examples as a guide? I was wondering if there are other requirements under the hood that might be missing or obscure from just looking at those outputs.

  2. For my particular case, because it’s a competitive turn-based game, the agents do not act in parallel but rather in sequence. (contrast to the first paragraph in the “Environment” section in the Multiagent Competitive DDPG tutorial). The environment at any time can return a value of which player’s turn it is to move (or, it’s simply determined by moving in a cycle through the players). How should this be handled?