Hi, I’m new here and it’s the first time to use the LSTM network in reinforcement learning. The state which is also the input of the neural network is a matrix with size (64, 10000). The batchsize is 1, sequence length is 10000. So it is obviously a long and huge input. My goal is to get a (64,1) continuous output as the action. I use three Conv1d layers before LSTM to reduce the LSTM input to (64, 64). But the training is very slow and unstable, with a lot of fluctuation in the reward curve. Any advice?