My target is pooling(max, min, avg, etc.) every sequence to one vector.
For example, given two sentences “I love Outman. I love Superman, too.”, after encoding, the first sentence length is 5(include punctuation) second is 6. I want to pool these two sentences to 2 vectors. And these sentences must encode in one round.
A simple idea is using the ‘scatter_’ function, but I have no idea about this.
# To simplify this question, I don't use batch.
# suppose we have the sentences embedding, 5+6=11 means the sum of the lengths of the sentences, 20 means the embedding dim.
sentence_embedding = torch.Tensor(11, 20)
# I have the split flag like this:
sentence_split_flag = torch.LongTensor([1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0])
....
# After some oprations(depends on the pooling method), I want to get:
pooled_sentence_embedding = torch.Tensor(2, 20)
And I’m sorry for my poor English, if you are perplexed for anything, I will try my best to explain it.
Yes, it works. But I think there is a more efficient method.
Maybe I could scatter the sentences which dim is (11, 20) to (2, 6, 10), which sentence length is smaller than 6 should be padded. Then pooling (2, 6, 10) to (2, 1, 10), and then reshape to (2, 10).
So I want to know one quick method to get the scatter index from sentence_split_flag or sentence_segment.