When I implement DPPO algorithm in pytorch, I found that different workers act exactly the same, and I don’t know why. In my game environment, every worker faces the same initial state, and I use torch.distributions.sample function to sample actions in worker. In ‘main’ function, I use
np.random.seed(params.seed)
torch.manual_seed(params.seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
to guarantee the reproducibility of the algorithm. So I think it might have some influence on the variation between workers. And I wonder how to solve that problem. Help me!