Reproducibility in multiagent environments

I need to create a reproducable code for PPO algorithm but in a multiagent environment. When I applied the same algorithm in a single agent setting the reproducibility was succesfull.

I think the problems stands in the initialization of the actor and critic networks.

I create the networks using the code snippet bellow, similar code snippet for the actor networks as well.

            for _ in range(len(self.env.agents)):
                critic = Critic(sum(obs_size), self.hidden_size).to(self.device)

Also, the environment is initialized using a seed as well, as shown bellow:

def init_env(gym_id: str, seed: int):
    env = make_env(gym_id, discrete_action=True)
    return env, seed

Additionally, the buffer from which the agents update their policy is also set to a specific seed, as shown bellow:

ids = np.arange(self.trajectory_size)
        for agent, _ in enumerate(self.env.agents):
            for epoch in range(self.epochs):

Is there anything else that is sensitive to randomness (and has to be set to a seed) in the PPO algorithm that I cannot see?