Proximal Policy Optimization


I just wanted to share this implementation of Proximal Policy Optimization for the C++ API of Pytorch with you. Feedback is much appreciated. I struggle on letting the algorithm converge for harder problems than this, shown on GitHub.


Thanks a lot for sharing Martin!

1 Like

PPO is tricky one to fine tune. For me it was a lot of trial and error to get it work on my own implementation. You may take a look at the hyperparameters I used for some OpenAI gym problems, maybe it will work for you too. You may also want to try normalize the state if it is complex on the harder problems. I hope it helps :slight_smile:

1 Like