Hi, I’m newbie to the pytorch and also newbie to the machine learning.
I was following the instructions from the pytorch tutorial (NMT part) and I wanted to add
the attention concept inside the network.
But, I thought that the tutorial code wasn’t going through calculating similarity scores between
decoder’s previous hidden state and all of the hidden states from the encoder.
So, I wanted to hard-code every step with respect to getting the scores.
According to the Luong et al. 2015, the score equation is given like below,
e(ij) = s(i-1) W(e) h(j)
which means, the score of i, j is defined by
decoder’s previous hidden state * ( linear transformed encoder hidden state of time step j )
So the key problem is that,
I need a matrix ( W(e) ) that will be trained, but when calculating the e(ij),
it elementary multiplies only the row of j.
I don’t know what function to declared for the network other than the nn.Linear.
Is there anyway to declare a linear transformation that is extracted only one row(j)
when calculating e(ij) and trained at the same time?