Like the title states: What’s the difference in using the Hidden State/Output of the last cell/state?
I have gone through various tutorials and code that utilise RNN’s(both GRU and LSTM) for tasks like Seq2Seq and Text Classification.
output, hidden = rnn("...", "...")#rnn = GRU/LSTM.
For Seq2Seq/Auto-Encoder etc. the
output of the encoder is ignored and the
hidden state of the last cell is used as an input to the decoder. And the Decoder uses the
output to predict each word one by one.
However, for the Text Classification task the
output of the last cell is used to predict the label after passing it through a feed-forward layer and some activation.
Why is the
output of the last cell preferred over the
hidden state in Text Classification task? Doesn’t the hidden state represent the whole sentence representation ? Or are they interchangeable?