Implementation multiagent learing

Hello, can someone help me with implementation of coordinated multiagent imitation learning? I use Gym Stag Hunt enviroment.

hi @TKO
Thanks for this. This is a broad question and I think your chances of getting a detailed answer will increase substantially if you narrow down the scope a little bit. Can you detail a bit what you would like to do what blocks you?

I have a lot of problems :D. Firstly i need convert this state to 3d array so that my neural network can learn.
state

You should map it to a float tensor and devide by 255. Have a look at torchvision transforms for this.
I wonder what those dimensions are though: is it that you have 36 agents, each returning an image as observation?
A better way to represent that would be a tensor of shape [36, 212, 158, 3] so that you keep the channels separated for the transforms.

It is not 36 agents it is only 2 agents. This screenshot is not complete. It is state of 8 played games

I use this enviroment GitHub - NullDefault/Gym-Stag-Hunt: A custom reinfrocement learning environment for OpenAI Gym & PettingZoo that implements various Stag Hunt-like social dilemma games.