[nn.EmbeddingBag]: How to weight embeddings before summation

weichium · October 15, 2017, 12:50am

Hi,

According to the current implementation: for bags of constant length, nn.EmbeddingBag with mode=sum is equivalent
to nn.Embedding followed by torch.sum(dim=1). In other words, the embeddings all have the same weight, i.e. 1. I wonder if it is possible to weight the embeddings before summing them up or if there is any efficient way to do so?

Currently what I did is first use nn.Embedding to extract embeddings, then multiply with weights, and finally summing them up. This, however, is very inefficient since we need to instantiate the intermediate embeddings. Is it possible to add such feature? Or perhaps providing some hints on how to do this effectively.

Thanks for the help!

tom · October 15, 2017, 5:38pm

Hi,

you can reformulate this as matrix (of stacked embedding vector) multiplied with a (weight-) vector. Then (possibly after sprinkling transpose on the weight vector and result) you can use torch.mv.
For batch operation it might be easier to make the weight vector into a nx1 matrix and use torch.matmul.

Best regards

Thomas

weichium · October 15, 2017, 6:49pm

Hi Thomas,

Thanks for your reply. I guess my question is not clear enough. Here is a more concrete example:

Say I have two sentences of different length. When computing the embedding one the two sentence, one naive way to do this is simply using nn.EmbeddingBag with mode='sum' or mode='mean'. However, what I want is sort of attention on the embeddings, i.e. rather than w_1 + w_2 + … + w_n, I’m looking for a_1 * w_1 + … + a_n * w_n, where both the attention weight a and the embeddings w are learnable. So what I did is first use nn.Embedding to extract the vectors, then weight it. Finally, since multiple sentences may have different length, I used nn.EmbeddingBag to sum the corresponding word embeddings by providing the offsets.

The solution that you suggest seems to me like a dense version, where there will be lots of 0 in the matrix. I believe it will consumes way more memory. What I’m looking for is a memory efficient (sparse) way to deal with this. Please correct me if I’m misunderstand something

tom · October 16, 2017, 4:04am

Hi,

indeed, you have me confused.
I thought that you had Embeddings “per seen word” from nn.Embedding (and can afford the memory from it) and wanted a weighted sum over them without an intermediate “multiply with weights” step.

As far as I understand, EmbeddingBag avoids the “per seen word” memory allocation by (in its slow cpu version) using Tensor.index_add_. Indeed I am unaware of a way to do this and have the weights applied in the same step.

Best regards

Thomas

MANSUM · November 28, 2017, 2:43pm

Can anyone answer this question?

368e621b293cf07f16ca · June 7, 2018, 1:25pm

It would also be awesome to train those weights in the weighted sum

marcwww · April 16, 2019, 7:23am

U could weight the embeddings first, then use F.embedding_bag to sum them efficiently.

AndyTengWei · August 10, 2019, 6:25am

Do you mean Attention in DL model?