Transformer models (Only encoder side)

calebh (Caleb Helbling) April 13, 2020, 4:26am 3

I have the same problems with transformers. See Transformer model doesn't improve even when fed the same single example over and over

Did you ever figure out a solution to your problem?