Ppo+lstm working code

granth_jain · August 31, 2020, 4:06pm

Hi,

I am looking for ppo + lstm implementation.
Can someone please help to let me know of available working code in pytorch for ppo + lstm.

Thanks

EsraaElelimy · December 19, 2020, 9:22am

Hi,
I am not sure if it’s too late to answer this but I came across this implementation for ppo with lstm : https://github.com/seungeunrho/minimalRL/blob/master/ppo-lstm.py
and the code is quite simple and easy to follow.
Hope it helps.

leckofunny · April 29, 2021, 8:24am

Hi @granth_jain
did you find a suitable implementation.

Unfortunately, the one proposed before is not really a good choice. It uses truncated bptt of sequence length 1. The CartPole environment is not a good environment to test if the recurrent policy is working even if you mask out the velocities of the agent’s observation space.

THEKROLL · June 25, 2024, 11:03am

Not exactly the wanted solution but a working (not another cartPole project) LSTM implementation. When you increase the sequence_length we feed the model and provide a more complex rewarding in the step function you can test how the model learns to remember sequences and relations in the past: GitHub - svenkroll/simple_RL-LSTM: A simple demonstration of how to train an LSTM model with Reinforcement Learning using PyTorch

leckofunny · June 25, 2024, 11:57am

Clean baseline repositories

PPO + LSTM (or GRU)

https://github.com/MarcoMeter/recurrent-ppo-truncated-bptt

PPO + Transformer-XL

https://github.com/MarcoMeter/episodic-transformer-memory-ppo

vmoens · June 28, 2024, 7:50am

This is entirely supported in torchrl.
Here’s an example with DQN

github.com

pytorch/rl/blob/1083b35ef9733b2335bd88d587cb282e180267c4/tutorials/sphinx-tutorials/dqn_with_rnn.py#L242


      
          #   However, to respect TorchRL's conventions, this LSTM must have the ``batch_first``
          #   attribute set to ``True`` which is **not** the default in PyTorch. However,
          #   our :class:`~torchrl.modules.LSTMModule` changes this default
          #   behavior, so we're good with a native call.
          #
          #   Also, the LSTM cannot have a ``bidirectional`` attribute set to ``True`` as
          #   this wouldn't be usable in online settings. In this case, the default value
          #   is the correct one.
          #
          
          lstm = LSTMModule(
              input_size=n_cells,
              hidden_size=128,
              device=device,
              in_key="embed",
              out_key="embed",
          )
          
          ######################################################################
          # Let us look at the LSTM Module class, specifically its in and out_keys:
          print("in_keys", lstm.in_keys)

I know some folks have open source code with PPO and LSTM/GRU
You will find real-life examples in this repo for instance: GitHub - Acellera/acegen-open: Language models for drug discovery using torchrl

Happy to provide more context if needed!