Hey,
i currently want to implement a imdb sentiment classifier with a transformer (encoder block). When i remove the encoder blocks from the network i get better results as with them.
And when i plot the attention matrix which as far as i know should show a vertical line just shows some straight lines.
Colab link: