Reducing LSTM output to prediction of a smaller (but greater than 1) timespan, while maintaining batches

I’m trying to use 365 days to predict the next 30, for stock prediction, with 5 features (open, high, close, low, volume). I’m using a batch size of 64.

My network looks like this:

  1. Input > (64, 365, 5)
  2. LSTM > (64, 365, 200)
  3. Linear > (64, 365, 100)
  4. Permute to (64, 100, 365) to fit into batch normalization
  5. Batched Normalization > (64, 100, 365)
  6. Permute to (64, 365, 100) to fit into ReLU
  7. ReLU > (64, 365, 100)
  8. Linear > (64, 365, 30)

Going roughly based on this for the structure.

The output I want is either (64, 30, 5) or (64, 30, 1). The 5 would be the same 5 features as the inputs, and the 1 would just be the average of the four price values or something…

All the sources I looked at for this kind of thing do something like lstm_out.view(-1, hidden_dim) with the LSTM output which gives me (23360, 100) and obviously won’t work for me as it gets rid of the batch organization. I could do (64, 36500) but that’s nothing like what any of the articles I read did, it’s a ridiculous amount of hidden units, and I don’t know how I would get back to (64, 30, 5) if I went down that route.