MultiheadAttention after LSTM returns the same output for all input, please watch me!

Here is the question,I meet the same question too. I have no idea why this happen and how to solve it, After a few rounds of training, I also observed the attention socres and found that

And here is my attention out