I cannot find any documentation or tutorials explaining how to implement a multi-agent environment or the details behind the API. Are there any resources you suggest to understand how the spec should be laid out?
I did see that page, and could proceed by analogy to fill out the spec. It really only gives a single example output, compared with the Pendulum tutorial which gives a fairly complete walk-through of an implementation I find it less useful, but I can try. I’d also look at some of the code of the existing environments that use the MARL API.
In terms of specific questions, I have two:
As a general principle, is it just a matter of nesting the usual env fields in the specs and output TensorDicts using these examples as a guide? I was wondering if there are other requirements under the hood that might be missing or obscure from just looking at those outputs.
For my particular case, because it’s a competitive turn-based game, the agents do not act in parallel but rather in sequence. (contrast to the first paragraph in the “Environment” section in the Multiagent Competitive DDPG tutorial). The environment at any time can return a value of which player’s turn it is to move (or, it’s simply determined by moving in a cycle through the players). How should this be handled?