How nn.Embedding trained?

when i have vocab size of 40000 and want to embed this to 300

I use nn.Embedding(40000, 300)

then How is embeddings are trained?? Since it is not kind of word2vec task, has no label for each word.

1 Like

Embedding is not for training, itโ€™s a lookup table. You first map each word in the vocabulary to a unique integer index, and then the nn.Embedding just map this index to a vector with size of 300.

if 300 vector is not trained, that word vector does not represent relations among words. right??

I found this post very helpful. I think the nn.Embedding just initialize the lookup table, and thereafter you train it with gradient descent.

1 Like

nn.Embedding acts like a trainable lookup table.
The relations between words will be learned during its training.
This blog post might be useful to get some intuition on this layer.


The 40,000 word vectors are learned as another parameter of the network that you train. There is a lot of literature on pretraining word embeddings using LSA or the W2V algorithms, but by initializing random vectors (in this size of dimension 300), and applying updates to those with backpropogation, we can learn good approximations of such vectors that are tuned to the objective of your NN.


Thanks @ptrblck. But I have another question how does Pytorch learn these embeddings? does it use word2vec Skipgram or CBOW kind of model? I mean can you kindly provide more details?

An embedding layer is a simple lookup table accepting a sparse input (word index) which will be mapped to a dense representation (feature tensor). The embedding weight matrix will get gradients and will thus be updated. SkipGram etc. would refer to a training technique and your model might use embedding layers for it.

1 Like