Embedding Class in NLP Tutorial

zpaines · September 14, 2018, 4:04pm

I’m going through this tutorial and I’m a bit stuck on the final exercise. I’ve seen lots of solutions that look like this one, where they simply have a single linear layer that maps the embedding from nn.Embedding to an output layer of vocab_size.

Does the nn.Embedding class get updated by backpropagation? In other words, is it just a lookup table (as the docs imply) or is it actually the first layer of the Neural Net? If it does get updated, how does that work?

If it doesn’t get updated, how does the structure in the solution I listed above generate new embeddings?

EDIT: After some testing, I can see that the values in the nn.Embedding class do get updated, so I suppose it must be getting updated during backpropagation. How is this happening? Is there a doc anywhere that explains this? I couldn’t figure it out by looking at the nn.Embedding source.

tom · September 14, 2018, 6:22pm

The nn.Embedding class is a lookup table mapping each index to a vector. In effect it is a matrix, the weight matrix of the embedding module.
Changing the vectors by a little bit will change the following computation by a small amount and eventually the (cross-entropy, typically) loss. Thus the vectors get a gradient in the backward step.

You could replace the nn.Embedding by a single parameter of dimension vocab_size * embedding_size and use indexing a[idxes] to get the vectors to get something that is mathematically equivalent, unless you use fancy features like inverse frequency scaling of gradients.

Best regards

Thomas