PPO for Discrete Action Spaces (CartPole)

ajberlier · September 22, 2024, 8:12pm

I followed the TorchRL getting started documentation and I am running into issue with not being able to learn the cartpole environment with PPO. The tutorial learns the double pendulum environment no problem, but when I change to cartpole and modify the probabilistic actor for a discrete output it does not learn. perhaps I am not making the proper modifications?

Thank you for any help you can offer.

ajberlier · September 22, 2024, 8:32pm

https://www.reddit.com/r/reinforcementlearning/comments/1eer8iv/why_is_my_ppo_algorithm_not_learning/

This Reddit post sorted things out for me. I am still not sure why this is the case though, so explanation would still be great. Thank you!

vmoens · September 23, 2024, 5:49am

We fixed these tutorials a few weeks ago, maybe worth checking the version on the main branch?
https://pytorch.org/rl/main/tutorials/getting-started-5.html