Centralized learning-decentralized execution clarification (engineering perspective on PPO algo)

Kimonili · September 1, 2020, 2:33pm

So in general, there will be N=number_of_agents critic and actor networks. The centralized learning appears in the fact that each critic network recieves state information from the other agents and then updates its weights?

So here all the agents make an action simultaneously.

Given the action each agent made in the previous step, they transision to a new state. The state of each agent will be shared to all the agents. So basically, every agent will be aware of the position of every agent.

Here the critics are evaluating its agent’s actions but being aware of the new dynamics of the environments? (after getting informed of the relative position of every agent)

This step is related to the previous one so the question remains the same.

I believe then that the best choice is torch.jit.fork_ , which I understood that can run in CPU as well.

I am really sorry for the questions, I just want to be sure I understand this correctly!

Thank you so much!