BERT positional encoding

kevs · May 3, 2021, 2:09am

How is the positional encoding for the BERT model implemented with an embedding layer? As I understand sin and cos waves are used to return information on what position a certain word has in a sentence - Is this what the lookup in weight is doing?

model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased')

model.embeddings.position_embeddings # this will return the layer I am trying to figure out

model.embeddings.position_embeddings.weight.shape # what do these dimensions mean?

The shape of the embedding is torch.Size([512, 768]). I would guess that 768 is the embedding dimension, but what is 512? Is that the max length of the sentence?