Multiple sums based on index

Given a 3D tensor A of size

(n, i, j)

And an index tensor of size n with a pattern like:

[1, 2, 2, 3, 3, 3, 4, 4, 4, 4 …]

I want to sum over the n dimensions of A according to the index tensor.

For example, if A is (6, 2, 2)

tensor([[[0.4, 0.4],
         [0.6, 0.3]],

        [[0.9, 0.3],
         [0.5, 0.6]],

        [[0.4, 0.1],
         [0.4, 0.8]],

        [[0.6, 0.3],
         [0.7, 0.1]],

        [[0.0, 0.9],
         [0.4, 0.7]],

        [[0.0, 0.4],
         [0.5, 0.7]]])

I want to sum A[1] + A[2] and A[3] + A[4] + A[5].
The result has size (3,2,2) and is equal to:

tensor([[[0.4, 0.4],
        [0.6, 0.3]],

       [[1.3, 0.4],
        [0.9, 1.4]],

       [[0.6, 1.6],
        [1.6, 1.5]]])

Which is the best way to do this, for large n ?

Hi Fst!

For efficiency for large n, the best way will be to compute the
cumulative sum of A (along the 0 dimension) and then obtain
the desired partial sums as differences of the appropriate cumulative
sums.

The cumulative sum requires (about) n additions, but you have
to perform (about) n additions in any event, so packaging the
computation as a cumulative sum probably lets pytorch make the
most efficient use of pipelining.

The rest is just monkeying around to index into the cumulative sum.

Consider:

>>> import torch
>>> print (torch.__version__)
1.12.0
>>>
>>> # example tensor
>>> A = torch.tensor ([[[0.4, 0.4],
...                     [0.6, 0.3]],
...                    [[0.9, 0.3],
...                     [0.5, 0.6]],
...                    [[0.4, 0.1],
...                     [0.4, 0.8]],
...                    [[0.6, 0.3],
...                     [0.7, 0.1]],
...                    [[0.0, 0.9],
...                     [0.4, 0.7]],
...                    [[0.0, 0.4],
...                     [0.5, 0.7]]])
>>>
>>> n = A.size (0)
>>>
>>> # assume that n is a "triangle number"
>>> k = int ((2 * n)**0.5)
>>>
>>> seq = torch.arange (1, k + 1)
>>> ind_up = ((seq * (seq + 1)) / 2).long() - 1
>>> ind_lo = ((seq * (seq - 1)) / 2).long() - 1   # incorrect negative index will be masked out
>>> msk = (ind_lo >= 0).float()  # mask to get correct zero-th slice of result
>>>
>>> Acs = A.cumsum (0)   # cumulative sum of slices
>>> res = Acs[ind_up] - msk[:, None, None] * Acs[ind_lo]   # clever differences of cumulative sums
>>>
>>> res   # final result
tensor([[[0.4000, 0.4000],
         [0.6000, 0.3000]],

        [[1.3000, 0.4000],
         [0.9000, 1.4000]],

        [[0.6000, 1.6000],
         [1.6000, 1.5000]]])

Best.

K. Frank