Different workers act exactly the same

007 · July 29, 2019, 3:25am

When I implement DPPO algorithm in pytorch, I found that different workers act exactly the same, and I don’t know why. In my game environment, every worker faces the same initial state, and I use torch.distributions.sample function to sample actions in worker. In ‘main’ function, I use

np.random.seed(params.seed)
torch.manual_seed(params.seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

to guarantee the reproducibility of the algorithm. So I think it might have some influence on the variation between workers. And I wonder how to solve that problem. Help me!

007 · July 29, 2019, 3:31am

The main、model、distributions function is as follows:

main: https://paste.ubuntu.com/p/JFDgrrkWPF/
model: https://paste.ubuntu.com/p/NgBdXPgdnb/
distribution: https://paste.ubuntu.com/p/3Y9fMc6HzF/

007 · July 29, 2019, 8:42am

Problem solved! Thanks to the code :https://github.com/ikostrikov/pytorch-a3c/blob/master/train.py
I added torch.manual_seed(params.seed + rank) in each of the worker to guarantee the variation between different workers and the reproduction of my algorithm.