For a very specific task, I want to try out something that us basically an encoder-decoder architecture using LSTM without attention, but where we do not have an encoder. Instead, we get a sentence embedding of the input. Considering that in a full encoder-decoder architecture, we also just pass a single representation to the encoder (rather than all tokens as in transformer models), it seems that this should be possible.
The problem that I am having is that I am not sure whether the sentence embedding should be passed as the hidden state or as the cell state to the decoder LSTM, and how the other one should be initialized in such scenario. Fundamentally, it is not clear to me (even after reading tons of documentation) what the difference in meaning is between the cell state and hidden state. I know the difference in code, but what do those differences represent?
Thanks in advance