Add an attention meachanism to LSTM

Hi, guys,

I am studying LSTM, and in Lstm input size, hidden size and sequence lenght, I found a good illustration how LSTM process input to outputs, so I implemented a sequence-to-label classifier.

However, I like to add a temporal attention, but reading in the forum I don’t find basis ideas to implement it, and some papers were confused for me.

If someone can share your experience or some useful tutorial, I could implement it.