Transformer model prediction same every time

Hello, i tried make masked word prediction model, but every time i give input to model masked word prediction is same. And weird way, its change with epochs, example with all inputs its predict “the” token, then i increase epoch and its predict “we” token.

Example:

Epoch 2:
Input:
[2, 42, 43, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of morning at blazing star and the settlement awoke to eee
[2, 42, 1, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of unnk at blazing star and the settlement awoke to eee
Output
[2, 42, 35, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of the at blazing star and the settlement awoke to eee

Epoch 10:
Input:
[2, 42, 43, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of morning at blazing star and the settlement awoke to eee
[2, 42, 1, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of unnk at blazing star and the settlement awoke to eee
Output
[2, 42, 100, 44, 45, 46, 28, 35, 47, 48, 8, 3]
 sss of we at blazing star and the settlement awoke to eee

What will cause this? Model doesn’t overfitting.