Hi,
I am working on implementing an attention based summarization model. Let’s assume that batch_size=2, vocabulary size=4, sequence length=3. I want to accumulate attention weights to create a probs matrix and then use these matrix to sample the words. Below is an example.
word_indices (batch_size, sequence_length)
[ 0 2 3
0 1 1 ] # duplicate word index
attn_weights (batch_size, sequence_length)
[ 0.1 0.3 0.6 ex) 0.3 is attention weights for word index 2
0.7 0.1 0.2 ]
probs (batch_size, vocabulary_size)
[ 0 0 0 0
0 0 0 0 ]
batch_indices = [0, 0, 0, 1, 1, 1]
word_indices = [0, 2, 3, 0, 1, 1]
repeat_indices= [0, 1, 2, 0, 1, 2]
# Accumulate attention weights
probs[batch_indices, word_indices] += attn_weights[batch_indices, repeat_indices]
I want to get the result as below.
probs = [ 0.1 0 0.3 0.6
0.7 0.3 0 0 ]
However, I got the result as below.
probs = [ 0.1 0 0.3 0.6
0.7 0.2 0 0 ]
The problem is that when there is a duplicate index ((1, 1) twice), only the value corresponding to the last index is applied.
How can i get the desired result?
Thanks,