Gym: Pendulum-v0 not solvable by vanilla policy gradient ? increase max torques?

zuoxingdong · October 19, 2017, 9:02am

The original max torques is +/- 2 with max speed +/- 8, according to some solutions, it needs to swing several times to balance upward. I guess it is not solvable by vanilla policy gradient with 1 layer MLP with 50 neurons. What might be good values for max torques and max speed such that the pendulum needs to swing only once or twice to balance upward ?

florin · April 13, 2018, 5:56pm

Out of curiosity, were you able to learn Pendulum-v0 with policy gradient?

JACKHAHA363 · May 13, 2018, 6:19pm

I changed the max/min torque to +8/-8 and still unable to solve it with REINFORCE or REINFORCE with a baseline. Maybe I need to tune it more.

alexis-jacq · May 13, 2018, 8:22pm

Indeed, REINFORCE is not that great in order to learn features through linear layers. Adding a prediction of values increases the speed to learn relevant features in the hidden layer. That’s why actor-critic is much more stable, and still work with 30 neurons.