Attention mechanism

mocialov · March 9, 2018, 1:14pm

I am trying to implement attention mechanism. Unfortunately, my current implementation has high loss than the model without attention.

Any ideas on how to improve the model or whether it even makes sense at the moment…

Ujan_Deb · April 3, 2018, 2:52pm

Why are you multiplying the hidden states and the output of the lstm? What do you mean by a linear decoder? Could you site the source of this architecture? The standard attention mechanism does not look like this.