How is the positional encoding for the BERT model implemented with an embedding layer? As I understand sin and cos waves are used to return information on what position a certain word has in a sentence - Is this what the lookup in weight is doing?
model = torch.hub.load('huggingface/pytorch-transformers', 'model', 'bert-base-uncased') model.embeddings.position_embeddings # this will return the layer I am trying to figure out model.embeddings.position_embeddings.weight.shape # what do these dimensions mean?
The shape of the embedding is
torch.Size([512, 768]). I would guess that 768 is the embedding dimension, but what is 512? Is that the max length of the sentence?