Initialize LSTM with sentence embedding

BramVanroy · September 7, 2020, 9:17pm

For a very specific task, I want to try out something that us basically an encoder-decoder architecture using LSTM without attention, but where we do not have an encoder. Instead, we get a sentence embedding of the input. Considering that in a full encoder-decoder architecture, we also just pass a single representation to the encoder (rather than all tokens as in transformer models), it seems that this should be possible.

The problem that I am having is that I am not sure whether the sentence embedding should be passed as the hidden state or as the cell state to the decoder LSTM, and how the other one should be initialized in such scenario. Fundamentally, it is not clear to me (even after reading tons of documentation) what the difference in meaning is between the cell state and hidden state. I know the difference in code, but what do those differences represent?

Thanks in advance

vdw · September 8, 2020, 1:16am

Your right, since the output of the encoder is just a sentence embedding, you can use an existing encoding sentence embedding and use only the decoder part.

A couple of days ago, there was this post about the implementation in a paper, where the embedding came from CNN and was fed into an LSTM decoder. In this paper – as far as I could understand – the authors set the intial hidden state h_0 and zero-ed the initial cell state c_0. I would simply try this approach, as well as:

use the embedding to set h_0 and c_0
use a nn.GRU which doesn’t have cell state

and just see how this effects the results.

BramVanroy · September 10, 2020, 10:02am

Thanks, very useful! I completely forget about GRU only having a hidden state, so I am going to try that for now - it must be a bit easier than fiddling with the LSTM in this case.