I am confused about how a bidirectional lstm works. I will post a link of the architecture of a bidirectional lstm. You can see that for each time-step the network produces an output based on past and future information wrt the time step. If I have a sequence of N elements and I pass the sequence to the lstm, for each time step I will produce an output that would be something like output(N-c) = [output_from_the_past(0:c),output_from_the_future(N:N-c)] . So each hidden state will have a part coming from the forward direction ( from the past to the future) and a part coming from the backward (from the future to the past ) . Now, supposed I have a sequence of N elements and I want to predict for each time-step the next value, I will have this configuration:
- SEQUENCE = [0,1,…,N]
- INPUT = [0,1,…,N-1]
- TARGET= [1,2,…,N]
- OUTPUT = [1_tilde,2_tilde,…,N_tilde] My network will produce an output of length equal to the length of the input because for each time step I will get an output(the hidden_state produced with that time step input ), but each of them will contain information from the past and the future ( as demonstrated before ) . So my question is: to predict the next value I am using both information from the past and the future, including information about the value I want to predict. Is the Bidirectional LSTM Cheating? Or maybe there is something I didn’t get ?
Here are some related links:
- Bidirectional lstm architecture : https://www.google.com/imgres?imgurl=https%3A%2F%2Fmiro.medium.com%2Fmax%2F764%2F1*6QnPUSv_t9BY9Fv8_aLb-Q.png&imgrefurl=https%3A%2F%2Ftowardsdatascience.com%2Funderstanding-bidirectional-rnn-in-pytorch-5bd25a5dd66&tbnid=dTcj5EWeqc27gM&vet=12ahUKEwji0qjZjsz6AhWswwIHHSChAY0QMygDegUIARDIAQ..i&docid=cuVJyCbuXd5BHM&w=764&h=270&q=bidirectional%20lstm%20architecture&hl=en&ved=2ahUKEwji0qjZjsz6AhWswwIHHSChAY0QMygDegUIARDIAQ
- Pytorch Documentation : LSTM — PyTorch 1.12 documentation