LSTM as decoder

harshraj22 · January 14, 2021, 2:15pm

consider the case of machine translation using encoder decoder architecture. Consider the case when nn.Lstm is used as encoder as well as decoder. Assume the number of nn.LstmCell in both is 5.

In case of encoder, during the forward propagation, we send a batch of sentences, and for each sentence, word_i is passed as input to LstmCell_i. This Lstm finally returns the hidden state to decoder.

In case of decoder, for each sentence, a loop is created, whose length is the length of sentence we want to generate. Each word is generated by passing previous word (and hidden state) through Lstm. The lstm with 5 layer is used to encode a sentence in encoder, while it is used to generate a word in decoder. Does this mean, while decoding, each word has to pass through each 5 LstmCell and the output of final cell predicts the word ?

Please correct me if I am wrong.

ps. I have already gone through this discussion.

miro · January 14, 2021, 4:03pm

This isn’t how forward prop of an LSTM works. The word is always passed into LstmCell_0, then the output of LstmCell_0 goes into LstmCell_1, etc.

harshraj22 · January 14, 2021, 4:51pm

Can you give a little more detail ?
and so, the final prediction for each word comes from the output of last LstmCell ? And does this mean images like this about Lstm are completely misleading ? (which show that each word is fed to each LstmCell) ?

miro · January 14, 2021, 6:33pm

each word is fed to each LstmCell

Those are in fact the same LstmCell unrolled, not different LstmCells.

vdw · January 15, 2021, 3:02am

That’s probably your misunderstanding. There is not a list of cells, but only cell that is recurrently used until a stop condiation is met – for the encoder, when the last word/item in the sequences as been processed.

The output of the cell at time i is used as input of the cell at the next time step i+1.

The illustrations you’ve linked just show the “unrolled” network to better visualize the sequential nature of the processing. All the cells are indeed the same cell. See the image below: both representations show the same network – left with the recurrent connection and right unrolled to better see how the sequence is processed.