The attention mechanism is added to the model, but the results barely change?

Hello,
What data/task are your testing the approach on? How do you measure the performance? Are you using the same metrics as the original papers? There are many things that could be going on.

Are you testing the approach on the same data as the original papers were? Could you please refer us to these papers and provide your implementation?

Note that a large amount of academic research overestimates the performance gains. There are many reasons for this, but one of them is that papers which do not show large improvements are simply not accepted as often. Hence, the survivor bias leads to over-optimistic evaluations of newly proposed methods.