How does nn.Embedding work?

Both nn.Linear and nn.Embedding will given you, in your example, a 3-dim vector. That’s the whole point, i.e., to convert a word into an ideally meaningful vectors (i.e., a numeric and fix-sized representation of a word). The difference is w.r.t. the input

  • nn.Linear expects a one-hot vector of the size of the vocabulary with the single 1 at the index representing the specific word

  • nn.Embedding just expects this index (and not a whole vector)

However, if both nn.Linear and nn.Embedding would be initialized with the same weights, their outputs would be exactly the same.

Yes, by default, the weights of both layers will be modified during the training process. In this respect, there are like any other layers in your network. However, you can tell the network not to modify the weights of any specific layer; I think it would look something like this:

embedding = nn.Embedding(10, 3)
embedding = weight.requires_grad = False

This makes sense if you use pretrained word embeddings such as Word2Vec or Glove. If you initialize your weights randomly, you certainly want them to be modified during training.

Pytorch has a decent tutorial on this:

https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html

Yes, the embedding layer is learnable. Ideally, the model should establish its own vector representations of the words, and this is the space where that semantic meaning gets defined.

Hence, there are downloadable pretrained vectors. However, take some caution with these as they were developed from scrapped data and contain a lot of irrelevant representations, such as website links and pure gibberish.