Does the combination of Conv1d and Lstm(or GRU) make any sense?

Hi, I’m new here and it’s the first time to use the LSTM network in reinforcement learning. The state which is also the input of the neural network is a matrix with size (64, 10000). The batchsize is 1, sequence length is 10000. So it is obviously a long and huge input. My goal is to get a (64,1) continuous output as the action. I use three Conv1d layers before LSTM to reduce the LSTM input to (64, 64). But the training is very slow and unstable, with a lot of fluctuation in the reward curve. Any advice?