I am trying to have self attention in LSTM and have trouble getting it working. I know in terms of the output (first return value of LSTM) we could unpack and unsort it. However, how’s this working with the hidden state of the hidden states? Thanks for any help and pointers!