Hello. I’m aware that this question (and many similar ones) have already been asked on this forum and Stack Overflow, but I’m still having trouble grasping how the concept works and wanted to ask a question based on a specific toy example that I went through.
I’m aware that the num_embeddings
argument refers to how many elements we have in our vocabulary, and embedding_dim
is simply referring to how many dimensions we want to make the embeddings.
The specific code that I tried is as follows:
import torch
import torch.nn as nn
embedding = nn.Embedding(num_embeddings=10, embedding_dim=3)
a = torch.LongTensor([[1, 2, 3, 4], [4, 3, 2, 1]]) # (2, 4)
b = torch.LongTensor([[1, 2, 3], [2, 3, 1], [4, 5, 6], [3, 3, 3], [2, 1, 2],
[6, 7, 8], [2, 5, 2], [3, 5, 8], [2, 3, 6], [8, 9, 6],
[2, 6, 3], [6, 5, 4], [2, 6, 5]]) # (13, 3)
c = torch.LongTensor([[1, 2, 3, 2, 1, 2, 3, 3, 3, 3, 3],
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]]) # (2, 11)
If I run a
, b
, and c
through embedding
then I get embedding tensors each of shape (2, 4, 3)
, (13, 3, 3)
, (2, 11, 3)
.
My question here is, shouldn’t b
give me an index out of range error, since it’s a tensor consisting of 13 words each of dimension 3, and hence is outside the range of the predefined 10?
Any tips or pointers are appreciated. Thanks in advance.