According to the current implementation: for bags of constant length,
mode=sum is equivalent
nn.Embedding followed by
torch.sum(dim=1). In other words, the embeddings all have the same weight, i.e.
1. I wonder if it is possible to weight the embeddings before summing them up or if there is any efficient way to do so?
Currently what I did is first use
nn.Embedding to extract embeddings, then multiply with weights, and finally summing them up. This, however, is very inefficient since we need to instantiate the intermediate embeddings. Is it possible to add such feature? Or perhaps providing some hints on how to do this effectively.
Thanks for the help!
you can reformulate this as matrix (of stacked embedding vector) multiplied with a (weight-) vector. Then (possibly after sprinkling transpose on the weight vector and result) you can use
For batch operation it might be easier to make the weight vector into a
nx1 matrix and use
Thanks for your reply. I guess my question is not clear enough. Here is a more concrete example:
Say I have two sentences of different length. When computing the embedding one the two sentence, one naive way to do this is simply using
mode='mean'. However, what I want is sort of attention on the embeddings, i.e. rather than w_1 + w_2 + … + w_n, I’m looking for a_1 * w_1 + … + a_n * w_n, where both the attention weight
a and the embeddings
w are learnable. So what I did is first use
nn.Embedding to extract the vectors, then weight it. Finally, since multiple sentences may have different length, I used
nn.EmbeddingBag to sum the corresponding word embeddings by providing the
The solution that you suggest seems to me like a
dense version, where there will be lots of
0 in the matrix. I believe it will consumes way more memory. What I’m looking for is a memory efficient (sparse) way to deal with this. Please correct me if I’m misunderstand something
indeed, you have me confused.
I thought that you had Embeddings “per seen word” from
nn.Embedding (and can afford the memory from it) and wanted a weighted sum over them without an intermediate “multiply with weights” step.
As far as I understand, EmbeddingBag avoids the “per seen word” memory allocation by (in its slow cpu version) using
Tensor.index_add_. Indeed I am unaware of a way to do this and have the weights applied in the same step.
Can anyone answer this question?
It would also be awesome to train those weights in the weighted sum
U could weight the embeddings first, then use F.embedding_bag to sum them efficiently.
Do you mean Attention in DL model?