Training LSTM Network as many-to-one or many-to-many

I have one basic decision to make and don’t really know what’s the better way.
I am using a LSTM Network to classify time series. Later the network should work online so that every input state is classified.

(1): Now, for training I could give a sliding window with the last n timesteps as input (n high enough/suitable for the task of course) and use the last output for the classification of the last time step (many-to-one)

(2): I could also give a big sequence and use every output to classify every timestep in this sequence (many-to-many)

Which is the better way to do it? In (1) I got for every timestep the same length of previous information. In (2), can I use a sliding window? Then every datapoint is used n times in one epoch with n=sliding_window_size.

I’m thankful for any recommendations!