Is Bidirectional LSTM cheating?

I am confused about how a bidirectional lstm works. I will post a link of the architecture of a bidirectional lstm. You can see that for each time-step the network produces an output based on past and future information wrt the time step. If I have a sequence of N elements and I pass the sequence to the lstm, for each time step I will produce an output that would be something like output(N-c) = [output_from_the_past(0:c),output_from_the_future(N:N-c)] . So each hidden state will have a part coming from the forward direction ( from the past to the future) and a part coming from the backward (from the future to the past ) . Now, supposed I have a sequence of N elements and I want to predict for each time-step the next value, I will have this configuration:

  • SEQUENCE = [0,1,…,N]
  • INPUT = [0,1,…,N-1]
  • TARGET= [1,2,…,N]
  • OUTPUT = [1_tilde,2_tilde,…,N_tilde] My network will produce an output of length equal to the length of the input because for each time step I will get an output(the hidden_state produced with that time step input ), but each of them will contain information from the past and the future ( as demonstrated before ) . So my question is: to predict the next value I am using both information from the past and the future, including information about the value I want to predict. Is the Bidirectional LSTM Cheating? Or maybe there is something I didn’t get ?

Here are some related links: