# [nn.EmbeddingBag]: How to weight embeddings before summation

Hi,

According to the current implementation: for bags of constant length, `nn.EmbeddingBag` with `mode=sum` is equivalent
to `nn.Embedding` followed by `torch.sum(dim=1)`. In other words, the embeddings all have the same weight, i.e. `1`. I wonder if it is possible to weight the embeddings before summing them up or if there is any efficient way to do so?

Currently what I did is first use `nn.Embedding` to extract embeddings, then multiply with weights, and finally summing them up. This, however, is very inefficient since we need to instantiate the intermediate embeddings. Is it possible to add such feature? Or perhaps providing some hints on how to do this effectively.

Thanks for the help!

2 Likes

Hi,

you can reformulate this as matrix (of stacked embedding vector) multiplied with a (weight-) vector. Then (possibly after sprinkling transpose on the weight vector and result) you can use `torch.mv`.
For batch operation it might be easier to make the weight vector into a `nx1` matrix and use `torch.matmul`.

Best regards

Thomas

Hi Thomas,

Thanks for your reply. I guess my question is not clear enough. Here is a more concrete example:

Say I have two sentences of different length. When computing the embedding one the two sentence, one naive way to do this is simply using `nn.EmbeddingBag` with `mode='sum'` or `mode='mean'`. However, what I want is sort of attention on the embeddings, i.e. rather than w_1 + w_2 + … + w_n, I’m looking for a_1 * w_1 + … + a_n * w_n, where both the attention weight `a` and the embeddings `w` are learnable. So what I did is first use `nn.Embedding` to extract the vectors, then weight it. Finally, since multiple sentences may have different length, I used `nn.EmbeddingBag` to sum the corresponding word embeddings by providing the `offsets`.

The solution that you suggest seems to me like a `dense` version, where there will be lots of `0` in the matrix. I believe it will consumes way more memory. What I’m looking for is a memory efficient (sparse) way to deal with this. Please correct me if I’m misunderstand something

Hi,

indeed, you have me confused.
I thought that you had Embeddings “per seen word” from `nn.Embedding` (and can afford the memory from it) and wanted a weighted sum over them without an intermediate “multiply with weights” step.

As far as I understand, EmbeddingBag avoids the “per seen word” memory allocation by (in its slow cpu version) using `Tensor.index_add_`. Indeed I am unaware of a way to do this and have the weights applied in the same step.

Best regards

Thomas