Different workers act exactly the same

When I implement DPPO algorithm in pytorch, I found that different workers act exactly the same, and I don’t know why. In my game environment, every worker faces the same initial state, and I use torch.distributions.sample function to sample actions in worker. In ‘main’ function, I use

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

to guarantee the reproducibility of the algorithm. So I think it might have some influence on the variation between workers. And I wonder how to solve that problem. Help me!

The main、model、distributions function is as follows:

  1. main: https://paste.ubuntu.com/p/JFDgrrkWPF/
  2. model: https://paste.ubuntu.com/p/NgBdXPgdnb/
  3. distribution: https://paste.ubuntu.com/p/3Y9fMc6HzF/

Problem solved! Thanks to the code :https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py
I added torch.manual_seed(params.seed + rank) in each of the worker to guarantee the variation between different workers and the reproduction of my algorithm.