A section to discuss RL implementations, research, problems
The reinforce.py and actor-critic.py examples do not really converge for a running_reward > 200 for me. Did anyone get it to work? I found that the running reward reached 199 quite often and then the rewards start to decrease. Does anyone have a similar experience?
running_reward > 200
I had the same problem, it only reached at 199 at my env, then back and force...
I got it work the first few times I ran it, but later without any changes the same situation happened as you mentioned. Very weird. Thought it’s using the same random seed thought out.