Encoder-decoder issues

I am attempting to implement an abstractive summarisation deep learning model on the gigaword dataset. I’m using an lstm for both encoding and decoding and I’ve tried to also include Bahdanau attention. On top of this i’ve got a pointer mechanism to allow for some words to be copied rather than generated.

my model seems to not be doing very well in training. Both greedy and Beam search results in repeated tokens and shows little to no improvement over each batch.

Would anybody be kind enough to have a look at my colab notebook or perhaps i could paste the code here?