Do I have to reset my lstm hidden state after each forward pass in reinforcment learning?

I want to make a model to learn a simple 2 player card game, where there is some randomness in the game, since a player can draw a random card. To, in a way, remember the cards drawn and what cards where played, I want to use some lstm cells, but I am not sure if I need to pass the previous hidden state of the last lstm cell to the next forward call or not.

My input for getting an action to play would be:
(1, representation_of_the_current_game_state), which means: (batch_size, representation_of_the_current_game_state),`

while for training:
(game_states, representation_of_this_game_state).

I’m using PPO as my training algorithm and passing all game states of the game played as my batch. Thanks in advance

The hidden state contains the learned representation of “memory” the model passes through time iterations. In your example of a card game, you are likely using the model forward pass per “turn”. And if you want the model to pass some memory values onward to the next “turn”, then you’ll need to do so via the hidden state. Resetting the hidden state will remove that memory.

In a game of cards, it may be helpful having a memory state as the current game state likely won’t represent the cards already used and no longer available in the deck - i.e. reasoning by deduction.

Keep in mind when passing the hidden layer, to use .detach() between each forward pass so as not to have issues with carrying gradients between passes.

This might not be compatible with an RNN. To make it compatible, you can instead use batches to pass n-games turn at each time step. I.e. have 60 games playing at one time with each forward pass being turn t+=1. You can use a done boolean array to represent the games finished, and filter for active games with games[~done] - that is games not done.