In the official Pytorch seq2seq tutorial, there is code for an Attention Decoder that I cannot understand/think might contain a mistake.

It computes the attention weights at each time step by concatenating the output and the hidden state at this time, and then multiplying by a matrix to get a vector of size equal to the output sequence length. Note, these attention weights don’t depend on the encoder sequence (named encoder_outputs in the code), which I think it should.

Also, the paper cited in the tutorial, lists three different score functions that can be used to compute attention weights (section 3.1 in the paper). None of these functions is just concatenating and multiplying by a matrix.

So it seems to me that the tutorial is mistaken both in the function it applies and the arguments that are passed to this function. Am I missing something?