Masked Summation of Tensor

mattyd2 · February 15, 2017, 6:14pm

I have:
PRE EMBEDDING:
input.size() (2406L, 2L) – -- (word_id, batch_size)

   99    99
  489  1171
  281  1317
     ⋮      
    0   435
    0  2741
    0   517
[torch.LongTensor of size 2406x2]```

POST EMBEDDING:
```context_output.size() -- (2406L, 2L, 1L)``` -- (word_sequence, batch_size, attention_score)

```Variable containing:
( 0  ,.,.) = 
1.00000e-02 *
  2.0804
  1.6674
  2.9782
   ⋮    
  4.8565
  4.8565
  4.8565
...
( 1  ,.,.) = 
1.00000e-02 *
  0.2246
  1.4224
  4.1816
   ⋮    
  4.4363
  3.0162
  3.3986
[torch.FloatTensor of size 2x2406x1]```


I need to sum the values in the POST EMBEDDING tensor if they have the same ```word_id``` from the PRE EMBEDDING tensor.

Example:

PRE EMBEDDING
```Variable containing:
   4    1
   4    2
   2    2
[torch.LongTensor of size 3x2]```

POST EMBEDDING:

```Variable containing:
( 0  ,.,.) =
  0.35
  0.35
  0.65
...
( 1  ,.,.) = 
  0.25
  0.65
  0.65
[torch.FloatTensor of size 2x3x1]```

DESIRED RESULT

```Variable containing:
( 0  ,.,.) =
   4.0    0.7
   2.0    0.65
   1.0    0.0
...
( 1  ,.,.) =
   4.0    0.0
   2.0    1.3
   1.0    0.25
[torch.LongTensor of size 2x3x2]```

apaszke · February 15, 2017, 6:57pm

There’s no built in method that can do that. It seems weird to me that you want the desired results to have columns of (word_idx, sum), because the first column is integral, while the other is floating point. Tensors are always homogenous. I don’t think there’s a method for that in any of the numerical packages. You’ll have to process the tensors using your own Python code. If you want to make it differentiable, you can create your own autograd Function.