Multilayer LSTM & OpenNMT-py's `StackedLSTM`

pltrdy · April 27, 2017, 9:35am

Hi,

I am wondering why OpenNMT-py defines a class StackedLSTM given that torch’s native LSTM has a num_layers param.

What justifies this choice?
Is it in order to apply dropout between layers?

xwgeng · April 27, 2017, 9:51am

torch.nn.LSTM don’t support attention mechanism and have to implement attention based on torch.nn.LSTMCell manually. Howerer, torch.nn.LSTMCell doesn’t has a num_layers param. Thus, OpenNMT-py defines a StackedLSTM which supports attention and multi-layers, as a extension to torch.nn.LSTMCell

pltrdy · April 28, 2017, 10:09am

Hey, thanks for your reply. The post have been flagged by error.

The thing is, the attention mechanism isn’t part of the LSTM anyway (both conceptually and in the code). In ONMT-py, the decoder LSTM feeds its output in the attention layer at each timestep, see https://github.com/pltrdy/OpenNMT-py/blob/master/onmt/Models.py#L108

Therefore, I don’t think that the difference comes from attention.

xwgeng · April 28, 2017, 10:56am

oh, my bad. I mean the context vector, output of attention mechanism is as additional input of LSTM (Bahdanau et al.). But the OpenNMT use another attention strategy from Luong et al., and I think the target is to support input_feed (feed the context vector at each time step as additional input)