Attention model in seq2seq tutorial vis a vis Bahdanau et al 2014

While working through the tutorial, I got into some difficulties aligning [sic] the tutorial material with the original paper by Bahdanau et al. It is possible that I might be misinterpreting, but it seems to me that the original paper combines computes alignment of the encoder hidden unit (h) with the decoder hidden unit (s) [they call it a(s_{i-1}.h_j), while the tutorial seems to do a(s_{i-1}, y_{j-1}) where y is the output.

The alignments that the code produces are obviously very good, so it’s obviously quite flexible in the way we interpret it, but I was wondering if it would help resolve confusion if the tutorial were implemented more or less according to the paper’s architecture (together with the Bidirectional RNNs and all that).

I had the same question. It looks like it is different, but I’d love some confirmation, and also reason for doing it that way (is that from a different paper/implementation).