Hi,
I am looking for ppo + lstm implementation.
Can someone please help to let me know of available working code in pytorch for ppo + lstm.
Thanks
Hi,
I am looking for ppo + lstm implementation.
Can someone please help to let me know of available working code in pytorch for ppo + lstm.
Thanks
Hi,
I am not sure if it’s too late to answer this but I came across this implementation for ppo with lstm : https://github.com/seungeunrho/minimalRL/blob/master/ppo-lstm.py
and the code is quite simple and easy to follow.
Hope it helps.
Hi @granth_jain
did you find a suitable implementation.
Unfortunately, the one proposed before is not really a good choice. It uses truncated bptt of sequence length 1. The CartPole environment is not a good environment to test if the recurrent policy is working even if you mask out the velocities of the agent’s observation space.
Not exactly the wanted solution but a working (not another cartPole project) LSTM implementation. When you increase the sequence_length we feed the model and provide a more complex rewarding in the step function you can test how the model learns to remember sequences and relations in the past: GitHub - svenkroll/simple_RL-LSTM: A simple demonstration of how to train an LSTM model with Reinforcement Learning using PyTorch
Clean baseline repositories
PPO + LSTM (or GRU)
https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt
PPO + Transformer-XL
https://github.com/MarcoMeter/episodic-transformer-memory-ppo
This is entirely supported in torchrl.
Here’s an example with DQN
I know some folks have open source code with PPO and LSTM/GRU
You will find real-life examples in this repo for instance: GitHub - Acellera/acegen-open: Language models for drug discovery using torchrl
Happy to provide more context if needed!