Embedding layer trained or not in Transformer?

Hello, there!

I’m recently working on Transformer things(Self-Attention, Mutihead Attention).
But I am genuinely curious
whether the Embedding layer in the transformer is trained to have similarity suck like Skip-gram, CBOW
or it’s just initialized random vector.

Does it use pre-trained vocab?

Thx for reading.
:pray: :blush: