Reproducibility in multiagent environments

Kimonili · September 22, 2020, 5:21pm

I need to create a reproducable code for PPO algorithm but in a multiagent environment. When I applied the same algorithm in a single agent setting the reproducibility was succesfull.

I think the problems stands in the initialization of the actor and critic networks.

I create the networks using the code snippet bellow, similar code snippet for the actor networks as well.

            for _ in range(len(self.env.agents)):
                torch.manual_seed(self.seed)
                critic = Critic(sum(obs_size), self.hidden_size).to(self.device)
                self.critics.append(critic)

Also, the environment is initialized using a seed as well, as shown bellow:

def init_env(gym_id: str, seed: int):
    env = make_env(gym_id, discrete_action=True)
    env.seed(seed)
    np.random.seed(seed)
    return env, seed

Additionally, the buffer from which the agents update their policy is also set to a specific seed, as shown bellow:

ids = np.arange(self.trajectory_size)
        for agent, _ in enumerate(self.env.agents):
            for epoch in range(self.epochs):
                np.random.seed(self.seed)
                np.random.shuffle(ids)

Is there anything else that is sensitive to randomness (and has to be set to a seed) in the PPO algorithm that I cannot see?