I am using a LSTM with word2vec features to classify sentences. In order to improve performance, I’d like to try the attention mechanism. However, I can only find resources on how to implement attention for sequence-to-sequence models and not for sequence-to-fixed-output models.
Thus, I have a few questions:
Is it even possible / helpful to use attention for simple classifications?
Is there a small working example on how to combine a simple LSTM with attention? I could not find something helpful. All the code I found is very complicated, uncommented and also for seq2seq.
Sure, you can use attention mechanism for the seq-2-one.
You can just imagine the seq-2-one is a special case in seq-2-seq. Attention mechanism just adjust the weights to the input features of decoder by the features, last output and last hidden of RNN (not necessary if decoder is not a RNN). This mechanism itself even don’t know you are doing seq-2-one or seq-2-seq task. It’s all up to you.
Did you see these examples? You can see them as the introductory tutorial.