Different outputs for the same seed in "reinforce.py"

Hello,

I am doing some experimentation in reinforcement learning using pytorch. I am not sure why reinforce.py in the RL example folder produces different outputs for each run even though the seed is set to 543 by default. Here are the outputs from two runs:

OUTPUT OF 1st RUN:

Episode 10 Last length: 13 Average length: 10.64
Episode 20 Last length: 24 Average length: 11.37
Episode 30 Last length: 115 Average length: 15.63
Episode 40 Last length: 17 Average length: 19.16
Episode 50 Last length: 77 Average length: 22.33
Episode 60 Last length: 52 Average length: 24.56
Episode 70 Last length: 67 Average length: 28.63
Episode 80 Last length: 199 Average length: 40.23
Episode 90 Last length: 116 Average length: 53.10
Episode 100 Last length: 32 Average length: 54.78
Episode 110 Last length: 112 Average length: 58.36
Episode 120 Last length: 199 Average length: 70.91
Episode 130 Last length: 199 Average length: 83.16
Episode 140 Last length: 199 Average length: 94.23
Episode 150 Last length: 69 Average length: 95.64
Episode 160 Last length: 199 Average length: 97.84
Episode 170 Last length: 199 Average length: 107.51
Episode 180 Last length: 199 Average length: 112.87
Episode 190 Last length: 199 Average length: 116.42
Episode 200 Last length: 66 Average length: 122.85

OUTPUT OF 2nd RUN:

Episode 10 Last length: 13 Average length: 10.64
Episode 20 Last length: 24 Average length: 11.37
Episode 30 Last length: 115 Average length: 15.63
Episode 40 Last length: 17 Average length: 19.16
Episode 50 Last length: 77 Average length: 22.33
Episode 60 Last length: 52 Average length: 24.56
Episode 70 Last length: 67 Average length: 28.63
Episode 80 Last length: 199 Average length: 40.23
Episode 90 Last length: 113 Average length: 53.07
Episode 100 Last length: 199 Average length: 62.14
Episode 110 Last length: 199 Average length: 74.02
Episode 120 Last length: 70 Average length: 78.82
Episode 130 Last length: 51 Average length: 79.03
Episode 140 Last length: 108 Average length: 78.31
Episode 150 Last length: 124 Average length: 81.46
Episode 160 Last length: 127 Average length: 86.14
Episode 170 Last length: 53 Average length: 87.84
Episode 180 Last length: 85 Average length: 86.07
Episode 190 Last length: 91 Average length: 86.13
Episode 200 Last length: 112 Average length: 87.26

Notice that the outputs started to differ beginning from episode 90. I also tried to set the same seed value to np.random but with no success.

Any idea to what could be the source of the non-determinism in the script? Or is there a bug?

This is an interesting one that I’ve seen before in similar situation, what’s going on is a little complex, and there is nothing you can do about it.

I’m going to guess that you’re either running this on the GPU? Well, it turns out that reductions on the GPU (like sum for matrix multiplication and softmax) are actually slightly non-deterministic, in that the reduce operation will not be applied in the same order. Most reduction operations are commutative (they can be applied in any order and still get the same results), however because we are using floats (I’m guessing you’re using 32 bit?) we have a slightly different rounded value because we calculated in different order.

Initially this difference in rounding is only slight, and doesn’t show up in your output at all (if you print out more decimal places you should see the effect earlier), but it accelerates as the fixed-precision errors accumulate, and eventually it will result in such significantly different values that it takes your optimisation off in a completely different direction, and we get significant divergences as above.

There is nothing you can do about this if you are after repeatable results on the GPU, GPU computation is inherently stochastic. However you can try:

  • running your experiments repeatedly (with different weight initialisations as well) and average them to get a more robust estimate of your score
  • run your experiments for a shorter time (until before the divergence occurs)
  • run your experiments on a single core CPU
2 Likes

Hey Tom, thanks for the explanation. That would make sense if I used GPUs. But the results shown here are from running the script “as it is” on my macbook’s CPU. There is no threading or parallelism in this script so I assume it run on a single core. Could someone reproduce these results?