Ppo+lstm working code

Hi,

I am looking for ppo + lstm implementation.
Can someone please help to let me know of available working code in pytorch for ppo + lstm.

Thanks

Hi,
I am not sure if it’s too late to answer this but I came across this implementation for ppo with lstm : https://github.com/seungeunrho/minimalRL/blob/master/ppo-lstm.py
and the code is quite simple and easy to follow.
Hope it helps.

Hi @granth_jain
did you find a suitable implementation.

Unfortunately, the one proposed before is not really a good choice. It uses truncated bptt of sequence length 1. The CartPole environment is not a good environment to test if the recurrent policy is working even if you mask out the velocities of the agent’s observation space.

Not exactly the wanted solution but a working (not another cartPole project) LSTM implementation. When you increase the sequence_length we feed the model and provide a more complex rewarding in the step function you can test how the model learns to remember sequences and relations in the past: GitHub - svenkroll/simple_RL-LSTM: A simple demonstration of how to train an LSTM model with Reinforcement Learning using PyTorch

Clean baseline repositories

PPO + LSTM (or GRU)

https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt

PPO + Transformer-XL

https://github.com/MarcoMeter/episodic-transformer-memory-ppo

This is entirely supported in torchrl.
Here’s an example with DQN

I know some folks have open source code with PPO and LSTM/GRU
You will find real-life examples in this repo for instance: GitHub - Acellera/acegen-open: Language models for drug discovery using torchrl

Happy to provide more context if needed!