How pack_padded_sequence works with the hidden states?

I am trying to have self attention in LSTM and have trouble getting it working. I know in terms of the output (first return value of LSTM) we could unpack and unsort it. However, how’s this working with the hidden state of the hidden states? Thanks for any help and pointers!

You can look at the fastai implementation of the SelfAttention layer link