Training Word Embedding with Categorical Vector

John_Watkins · August 21, 2020, 8:21am

As far as I am aware, the Embedding class “forward-pass” converts each word-index in a LongTensor into its corresponding word embedding. However, I am wondering if there is a method to adapt Embedding to accept a categorical vector instead of word-index? Specifically, each vector represents a distribution over the entire vocabulary, and the Embedding would return a linear combination of the word embedding based on the vector. So for example:

If the vector is [0.7, 0.2, 0.0, 0.1], the embedding would be 0.7emb0+0.2emb1+0.0emb2+0.1emb3.

While I suppose one way to do this is to create a matrix of dimension [vocab_size, embedding_size] to substitute the Embedding, I have some pretrained Embeddings that I would prefer to (continue) using.

Any help is appreciated.

vdw · August 21, 2020, 1:40pm

I haven’t tried it, but this should work just fine since both the embedding layer or your “manual” linear layer have the same shape (vocab_size, embed_dim). And it should be possible to set the weights of the linear layer to the pretrained word vectors.

John_Watkins · August 22, 2020, 2:00am

Thank you for your reply. I have a follow-up question:

For Embedding, there is a padding_idx argument. As I understand, the row of weights corresponding to the padding_idx is always zeroed, and never updated during backpropagation. This causes the padding vector to return an all-zero vector.

Is it possible to do something similar for the weights of a Linear module? Specifically, is there a way to define only a certain row of weights that is initialized to zero, and never update by backpropagation?

vdw · August 23, 2020, 11:44pm

Well, I guess you can set the weights manually to zero in your forward method. Before pushing the tensor to the linear layer, you first set the weights as position padding_idx to zero. In this case, you don’t need to worry if there’s any update through backpropagation. I mean, there is, but you overwrite it each time before “using” the linear layer.