Question about broad casting (Attention mechanism)

Hi,

I am working on implementing an attention based summarization model. Let’s assume that batch_size=2, vocabulary size=4, sequence length=3. I want to accumulate attention weights to create a probs matrix and then use these matrix to sample the words. Below is an example.

word_indices (batch_size, sequence_length)
 [ 0   2   3                 
   0   1   1 ]                               # duplicate word index

attn_weights (batch_size, sequence_length)    
 [ 0.1   0.3   0.6                        ex) 0.3 is attention weights for word index 2 
   0.7    0.1   0.2 ]

probs (batch_size, vocabulary_size)
[ 0   0   0   0
  0   0   0   0 ]


batch_indices = [0, 0, 0, 1, 1, 1]
word_indices = [0, 2, 3, 0, 1, 1]
repeat_indices= [0, 1, 2, 0, 1, 2]

# Accumulate attention weights
probs[batch_indices, word_indices] += attn_weights[batch_indices, repeat_indices]

I want to get the result as below.

probs =  [ 0.1  0   0.3  0.6
           0.7  0.3  0   0 ]

However, I got the result as below.

probs = [ 0.1  0   0.3  0.6
          0.7  0.2  0   0 ]

The problem is that when there is a duplicate index ((1, 1) twice), only the value corresponding to the last index is applied.
How can i get the desired result?
Thanks,

1 Like