Hi,
I am wondering why OpenNMT-py defines a class StackedLSTM given that torch’s native LSTM has a num_layers
param.
What justifies this choice?
Is it in order to apply dropout between layers?
Hi,
I am wondering why OpenNMT-py defines a class StackedLSTM given that torch’s native LSTM has a num_layers
param.
What justifies this choice?
Is it in order to apply dropout between layers?
torch.nn.LSTM
don’t support attention mechanism and have to implement attention based on torch.nn.LSTMCell
manually. Howerer, torch.nn.LSTMCell
doesn’t has a num_layers
param. Thus, OpenNMT-py defines a StackedLSTM
which supports attention and multi-layers, as a extension to torch.nn.LSTMCell
Hey, thanks for your reply. The post have been flagged by error.
The thing is, the attention mechanism isn’t part of the LSTM anyway (both conceptually and in the code). In ONMT-py, the decoder LSTM feeds its output in the attention layer at each timestep, see https://github.com/pltrdy/OpenNMT-py/blob/master/onmt/Models.py#L108
Therefore, I don’t think that the difference comes from attention.
oh, my bad. I mean the context vector, output of attention mechanism is as additional input of LSTM (Bahdanau et al.). But the OpenNMT use another attention strategy from Luong et al., and I think the target is to support input_feed
(feed the context vector at each time step as additional input)