Question about Embeddings

I would like to build a Transformer from scratch, that predicts a certain number of a sequence.
However, I lack some understanding of how embeddings work. I understand the basic concept, but my current understanding seems not to be true in practice:

Assume I have a tensor of the following shape:

torch.Size([8000, 4]),

where each element in the first dimension is an integer between 0 and 9:

tensor([[9, 9, 7, 8],
        [2, 4, 1, 6],
        [9, 7, 1, 0],
        ...,
        [8, 7, 1, 4],
        [4, 2, 8, 0],
        [9, 1, 4, 7]])

I now want to apply an embedding on that tensor:

embedding_layer = nn.Embedding(10, 500)
transform = embedding_layer(data)

As expected, the returned shape of transform is torch.Size([8000, 4, 500]), since ever element has been transformed into the embedded_dimension 500. But I also expect a dictionary size of 10, but where can I find it? If it exists, I expect some attribute of the layer showing me 10, bluntly said.
Can someone tell me, if my understanding is correct? And if so, can PyTorch show the size of the vocab?

You can find the num_embeddings=10 in the shape of the weight which acts as a lookup table:

emb = nn.Embedding(num_embeddings=10, embedding_dim=500)
print(emb.weight.shape)
# torch.Size([10, 500])

The following code works.

import torch

input = torch.randint(1,10, [8000, 4])
embedding_layer = torch.nn.Embedding(10, 500)
transform = embedding_layer(input)
print(“Final output”)

However, we will get an error if we change 10 to 11 in the next part. Size of the dictionary of embeddings (num_embeddings) will be index out of range. You can not see num_embeddings as a dimension in transform variable but it has an effect in the boundaries of input.

import torch

input = torch.randint(1,11, [8000, 4])
embedding_layer = torch.nn.Embedding(10, 500)
transform = embedding_layer(input)
print(“Final output”)

I hope it may help you :raised_hand:

The embedding layer is learnable. So you can add additional rows on dim=0 to the weights if adding new vocab.

with torch.no_grad():
    old_weights=model.embeddings.weights
    new_weights=torch.cat([old_weights, torch.rand(1, old_weights.size(1))/old_weights.size(1)])
    model.embeddings.weights=new_weights