Attention on nn.Embeddingbag

Is there any way to give attention on Embeddingbag?

In other words, the current implementation of Embeddingbag sums up or computes the mean vector of multiple indices given.

What I want to do is, instead of simple mean or sum, I want to compute weighted sum.

So, instead of 1/3*(e1+e2+e3), i want to do the following:


Can anyone help?

Not without some changes to the backend code. However, you can easily achieve the same effect using nn.Embedding albeit slower.

I see. Any plan to implement this?

I think using nn.Embedding for this would be too slow and not efficient.

Why would it be too slow? It is just an additional weighted sum of vectors. It’s not like you need to extract the embedding one by one.

In the documentation, it says

nn.EmbeddingBag is much more time and memory efficient than using a chain of these operations.

Yes I know, and I mentioned in my first reply that it will be slower. Of course EmbeddingBag is more efficient, otherwise it won’t even exist. I’m just saying that using Embedding for this is likely not “too slow” and probably not a huge performance bottleneck for your particular use case.

If you are really concerned, you can run some tests and measure the time difference. If that is crucial to you, you can make a github request and see if anyone wants to implement, or you can do it yourself.


Thanks a lot for your support!

Another quick question.

If I want to achieve the function of nn.Embeddingbag with nn.Embedding, is padding the only way I can do?
Assuming the followings are inputs to nn.Embeddingbag:

idx = [0,3,2,4,5,9,23]
offset = [0,2,4]

In this case, the actual input to nn.Embedding is [[0,3],[2,4],[5,9,23]].

So to make a batch from this input, do I need to pad?

I think i can increase the indices by 1 and pad 0 up to the max length of inputs, which is 3 in this case. And I can use padding_idx option in nn.Embedding.

However, my concern is that if the max length is large, then I would have to store a lot of 0s, which will be very memory inefficient.

Is there any way that I can achieve this efficiently?

I doubt it would be too bad with respect to storing 0s. Unlike tensorflow you only have to pad up to the max length of the longest example in the batch. If it were really important to not pad too much for each batch you could sort your examples by length then create your batches. This is one common technique used to make RNN models more efficient

That being said, in at least the NLP models where I have used embeddings/embedding bags that part is quite fast compared to the rest of the model. I would recommend implementing the model, and if it is still too slow then optimize rather than worry about it right away

You can flatten the indices, retrieve embeddings, and get embeddings for each seq using the seq lengths.