# How should I understand the num_embeddings and embedding_dim arguments for nn.Embedding?

Hello. I’m aware that this question (and many similar ones) have already been asked on this forum and Stack Overflow, but I’m still having trouble grasping how the concept works and wanted to ask a question based on a specific toy example that I went through.

I’m aware that the `num_embeddings` argument refers to how many elements we have in our vocabulary, and `embedding_dim` is simply referring to how many dimensions we want to make the embeddings.

The specific code that I tried is as follows:

``````import torch
import torch.nn as nn

embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)

a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # (2, 4)

b = torch.LongTensor([[1, 2, 3], [2, 3, 1], [4, 5, 6], [3, 3, 3], [2, 1, 2],
[6, 7, 8], [2, 5, 2], [3, 5, 8], [2, 3, 6], [8, 9, 6],
[2, 6, 3], [6, 5, 4], [2, 6, 5]]) # (13, 3)

c = torch.LongTensor([[1, 2, 3, 2, 1, 2, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]) # (2, 11)
``````

If I run `a`, `b`, and `c` through `embedding` then I get embedding tensors each of shape `(2, 4, 3)`, `(13, 3, 3)`, `(2, 11, 3)`.

My question here is, shouldn’t `b` give me an index out of range error, since it’s a tensor consisting of 13 words each of dimension 3, and hence is outside the range of the predefined 10?

Any tips or pointers are appreciated. Thanks in advance.

1 Like

I think when you do

``````embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)
``````

then it means that you have 10 words and represent each of those words by an embedding of size 3, for example, if you have words like

``````hello
``````
``````world
``````

and so on, then each of these would be represented by 3 numbers,
one example would be,

``````hello -> [0.01 0.2 0.5]
world -> [0.04 0.6 0.7]
``````

and so on, if you do

``````list(embedding.parameters())
``````

then you will get something like this,

``````[Parameter containing:
tensor([[ 0.9227,  0.6492, -1.1440],
[ 1.5318, -0.2873, -0.7290],
[-0.4234, -1.7012, -0.9684],
[-0.2859,  1.4677, -1.4499],
[-1.8966, -1.4591,  0.5218],
[ 2.4023, -1.5395, -0.7947],
[-0.0464,  0.7174, -0.7452],
[ 0.9500, -0.4633,  0.5398],
[ 0.3458, -0.7997,  0.8895],
``````

which represents how are each of these words represented,

when you do,

``````a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # (2, 4)
``````

and then

``````embedding(a).shape
``````

it gives

``````torch.Size([2, 4, 3])
``````

while

``````embedding(a)
``````

gives

``````tensor([[[ 1.5318, -0.2873, -0.7290],
[-0.4234, -1.7012, -0.9684],
[-0.2859,  1.4677, -1.4499],
[-1.8966, -1.4591,  0.5218]],

[[-1.8966, -1.4591,  0.5218],
[-0.2859,  1.4677, -1.4499],
[-0.4234, -1.7012, -0.9684],
``````

because you are retrieving the embeddings of those words, means, you are asking give me the embedding of word at index 1, give me embedding of word at index 2, and so on. So, it gives you embeddings of words at indices that you asked.

when you do

``````b = torch.LongTensor([[1, 2, 3], [2, 3, 1], [4, 5, 6], [3, 3, 3], [2, 1, 2],
[6, 7, 8], [2, 5, 2], [3, 5, 8], [2, 3, 6], [8, 9, 6],
[2, 6, 3], [6, 5, 4], [2, 6, 5]]) # (13, 3)
embedding(b)
``````

then it means gives me the embedding of word at index 1, give me embedding of word at index 2, then 3, then 2, then 3, then 1 and so on.

here, ‘a’ and ‘b’ contain indices of words you want to retrieve the embedding for.

3 Likes

The values in `a` must all be between `1` and `num_embeddings`:

``````for i in range(num_embeddings + 4):
try:
embedding(torch.arange(i))
except:
print(f"failed for i={i}")
``````

The input data can be any shape:

``````try:
embedding(torch.randint(1, num_embeddings, size=(2,3,4,5,6)))
print("it worked")
except:
print("this won't print because this won't fail")
``````

Each entry in your input tensor is mapped to a vector with 3 coordinates (a 3-dimensional vector in mathematical terminology, but not in the sense of `PyTorch` `tensor`s), which can be found in the last axis of the output tensor. Namely, `a[i][j]` is mapped to the vector `embedding(a)[i][j]`, which is a `0`-dimensional `tensor` with 3 components.

Interpretation of dimensions
Say for example your input `tensor` has `shape` `(13, 3)`, with values between `1` and `10`. If this were to represent text, then you can think of it as having `13` samples of text each containing `3` words each, and each word is taken from a vocabulary of `10` words.

1 Like

Hello, Do not forget that the `13` is just a dimension. What really matters is the indexes in the b vector. Your number of embedding is `10`. So the values of your input all have to be `less than or equal to 10` and you have satisfied that condition in the `b tensor`. All of the `13 by 3 tensors` have be projected to a `13 by 3 by 3` space. Thank you

what is the use of embedding, how people use it ?

Hello vainaijr,

I agree with you. I would add that PyTorch’s Tutorial specifically on Word Embeddings does a good job with communicating an intuition (https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html).

As you have kind of focused on the transformer architecture and also written about a use in CV, I just want to throw in this relatively new paper which could be interesting (https://openreview.net/forum?id=YicbFdNTTy).

Furthermore, I don’t think these embeddings (speaking about word embeddings) claim to consider order, that’s why we have positional encoding in transformers for instance.
I guess one of the main advantages in using (word) embeddings is that we have dense vectors and also the ability to ‘compare the meanings’ of words just using the embeddings.

Greetings,
Unity05

1 Like

thank you so much for the explanation it clears out a lot of fogs for me.
do you by any chance know a simple example (beyond the pytorch tutorial) that I can look into for understanding even better?

For Question Answering tasks, can we use nn.Embedding to represent role embeddings? In other words, the agent role embedding and the user role embedding are both trainable. The reasoning behind this is to use some embedding representations for agent and user utterances (GloVe, fastText, nn.Embedding, BERT embeddings, etc.), and we could add these trainable embeddings to the utterance representation according to the role utterance to help the model distinguish agent and user utterances.

For example, the authors of Multi-domain Dialogue State Tracking as Dynamic Knowledge Graph Enhanced Question Answering apply this idea (search for role embedding on the .pdf). I would like to do something similar with torch.

Hi,

spontaneously, I don’t see a reason, why it should not work. I haven’t known this paper before and I’m curious if you’ve tried it. If yes, did it go well? ^^