Hi
I have trained a LSTM network (to solve embedding, LM and classification tasks). At inference time I am feeding an input text and reading the last hidden state of the network as the representation of the input sequence. My forward looks as follows:
def forward(self, input, hidden):
"""forward to model"""
emb = self.word_embeddings(input)
_, (hT, _) = self.rnn(emb, hidden)
return hT
I would like to use the lstm to generate vectors for any pair of sequences s1
and s2
(say h1T
and h2T
respectively) and use the vectors to compute the distances via F.cosine_similarity
score = F.cosine_similarity(h1T,h2T,1,'1e-6').data[0]
the problem i am seeing is that the final word in the input sequence dominates the representation. For instance, if i have 2 LSTM vectors of the following 2 sentences
s1= hospital emergency room
s2= hospital 2017 budget
then the following test sentence does not score highly with the above 2 sentences
t1 = hospital emergency policy
whereas the test sentence
t2 = US congressional budget
scores highly to
s2 = hospital 2017 budget
Is there a good explanation for what could be a reason behind this? I thought the final state would represent earlier tokens as part of its representation.
thank you