Statistics to monitor the behavior of hidden states (GRU, LSTM)

Hello everybody!

I’m experimenting with different hidden state initializations (learnable, sampled, zero, one, …) and RNN architectures in the context of reinforcement learning. Now, I’d like to watch the effects of each experiment, like how is the hidden state affected? Right now, I can only monitor the actual performance of the entire system (mean reward, accuracy, losses, …), but I’d like to monitor some measures that solely focus on the hidden state. Does anybody have some tips or recommendations?

These are the things that came into my mind, but I’m not sure if these are suitable:

  • Monitor gradient scale for each layer
  • KL Divergence
  • Sparsity/density of the activations inside the recurrent cell
  • Magnitude of hidden states