Transformer models (Only encoder side)

I have the same problems with transformers. See Transformer model doesn't improve even when fed the same single example over and over

Did you ever figure out a solution to your problem?