Attention with lstm

I’m trying to add an attention mechanism over an LSTM encoder decoder. If I understand correctly, the idea is to calculate a context vector at every time step of the decoder and use that along with the previous predicted output word to predict the next word.

Now, an LSTM takes as input the previous hidden, cell states and an input vector. Therefore I have to combine the last predicted word vector and the context vector before feeding to the lstm. Is that correct? If so what is the standard way of doing that? Something like this?

non_linear_function(torch.mm(weight1, prev_output) 
      + torch.mm(weight2, context_vector))

there is a very nice introductory tutorial on GitHub : https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation.ipynb

You can also see a more efficient implementation of attention on the GitHub page of opennmt: https://github.com/OpenNMT/OpenNMT-py/blob/master/onmt/modules/GlobalAttention.py

1 Like