Hi I am trying to implement the models mentioned in the paper [LSTM-BASED DEEP LEARNING MODELS FOR NONFACTOID ANSWER SELECTION] (https://openreview.net/pdf?id=ZY9xwl3PDS5Pk8ELfEzP) specifically the QA-LSTM with Attention. I understand the concept of attention as applied in the tutorial for language translation seq2seq task but I am not able to understand how to translate that and implement it for this task.

They mention in the paper that

Specifically, given the output vector of biLSTM on the answer side at time step t, ha(t), and the

question embedding, oq, the updated vector hea(t) for each answer token are formulated below.

ma,q(t) = tanh(Wamha(t) + Wqmoq)

sa,q(t) ∝ exp(wT * msma,q(t))

hea(t) = ha(t)sa,q(t)

In order to implement this the question embedding would be the output of the question lstm and then to get the attention weights do we need to loop over a lstmcell and at each state compute sa_q ? In the seq2seq translation task we start with the token, but in the task we do not have any SOS token for the answer.

If someone can provide guidance it will be great, I am still relatively new to this task and learning through the papers.