BERT Embedding Vector

John_Grabner · January 9, 2021, 6:30pm

In applications like BERT, does the embedding capture the semantic meaning of the word , or does the embedding essentially learn a pseudo orthogonal friendly to the transformer it feeds?

Essentially the same question, in BERT like applications, is embedding equivalent to a reduced dimension orthogonal vector projected into a vector of dimension embedding_dim where the projection is learned?