There are multiple ways to do autoencoder with LSTM. At the high-level:
Encoder: will output a state.
Decoder: will use this state as the initial state, and will try to construct the input to the encoder.
The state from the encoder, can be:
Final state.
Average state for the last K timesteps.
Weighted average --soft-attention.
The output of a NN, which take all the encoder states and learn the best output (a more advance version of the soft-attention).
Also, because you are dealing with sequences, you can have multiple decoder to improve the presentation power of the encoder. Such as, decoder that construct current sequence, another decoder that predict the future…etc.