I would like to build a Transformer from scratch, that predicts a certain number of a sequence.

However, I lack some understanding of how embeddings work. I understand the basic concept, but my current understanding seems not to be true in practice:

Assume I have a tensor of the following shape:

`torch.Size([8000, 4])`

,

where each element in the first dimension is an integer between 0 and 9:

```
tensor([[9, 9, 7, 8],
[2, 4, 1, 6],
[9, 7, 1, 0],
...,
[8, 7, 1, 4],
[4, 2, 8, 0],
[9, 1, 4, 7]])
```

I now want to apply an embedding on that tensor:

```
embedding_layer = nn.Embedding(10, 500)
transform = embedding_layer(data)
```

As expected, the returned shape of transform is `torch.Size([8000, 4, 500])`

, since ever element has been transformed into the embedded_dimension 500. But I also expect a dictionary size of 10, but where can I find it? If it exists, I expect some attribute of the layer showing me 10, bluntly said.

Can someone tell me, if my understanding is correct? And if so, can PyTorch show the size of the vocab?