I am trying to implement Attention Model to lstm, where I am trying to predict a sentiment class given a sequence.

For calculating attention I have chosen a learnable decoder which will get the attention probability wrt the hidden states of the lstm. The decoder is learnable. Lastly the weighted sum of the attentions and hidden states give the context vector.

I am getting very low train accuracy. The model doesn’t seem to learn much. Can you suggest anything which I can do ? What could be the reason for this ?