Hello,
I am doing some experimentation in reinforcement learning using pytorch. I am not sure why reinforce.py in the RL example folder produces different outputs for each run even though the seed is set to 543 by default. Here are the outputs from two runs:
OUTPUT OF 1st RUN:
Episode 10 Last length: 13 Average length: 10.64
Episode 20 Last length: 24 Average length: 11.37
Episode 30 Last length: 115 Average length: 15.63
Episode 40 Last length: 17 Average length: 19.16
Episode 50 Last length: 77 Average length: 22.33
Episode 60 Last length: 52 Average length: 24.56
Episode 70 Last length: 67 Average length: 28.63
Episode 80 Last length: 199 Average length: 40.23
Episode 90 Last length: 116 Average length: 53.10
Episode 100 Last length: 32 Average length: 54.78
Episode 110 Last length: 112 Average length: 58.36
Episode 120 Last length: 199 Average length: 70.91
Episode 130 Last length: 199 Average length: 83.16
Episode 140 Last length: 199 Average length: 94.23
Episode 150 Last length: 69 Average length: 95.64
Episode 160 Last length: 199 Average length: 97.84
Episode 170 Last length: 199 Average length: 107.51
Episode 180 Last length: 199 Average length: 112.87
Episode 190 Last length: 199 Average length: 116.42
Episode 200 Last length: 66 Average length: 122.85
OUTPUT OF 2nd RUN:
Episode 10 Last length: 13 Average length: 10.64
Episode 20 Last length: 24 Average length: 11.37
Episode 30 Last length: 115 Average length: 15.63
Episode 40 Last length: 17 Average length: 19.16
Episode 50 Last length: 77 Average length: 22.33
Episode 60 Last length: 52 Average length: 24.56
Episode 70 Last length: 67 Average length: 28.63
Episode 80 Last length: 199 Average length: 40.23
Episode 90 Last length: 113 Average length: 53.07
Episode 100 Last length: 199 Average length: 62.14
Episode 110 Last length: 199 Average length: 74.02
Episode 120 Last length: 70 Average length: 78.82
Episode 130 Last length: 51 Average length: 79.03
Episode 140 Last length: 108 Average length: 78.31
Episode 150 Last length: 124 Average length: 81.46
Episode 160 Last length: 127 Average length: 86.14
Episode 170 Last length: 53 Average length: 87.84
Episode 180 Last length: 85 Average length: 86.07
Episode 190 Last length: 91 Average length: 86.13
Episode 200 Last length: 112 Average length: 87.26
Notice that the outputs started to differ beginning from episode 90. I also tried to set the same seed value to np.random but with no success.
Any idea to what could be the source of the non-determinism in the script? Or is there a bug?