No change/negligible change in loss while using Transformers

I’ve been trying to work on a text classification problem using LSTM and transformers. When I use the LSTM model, it seems to train fine. But When I train with the transformer model, there the loss and accuracy are constant and there seems to be no training happening.
Where am I doing wrong?
image

Here is the link to the notebook: