Efficient way to do a strided sum and reduce a nxm tensor to a n vector

Typically m << n. But for example, say with n=3, m=2, I want to reduce a 3x2 tensor to a 3 vector in this fashion:
I want to write the 3x2 tensor in this fashion

00 01 xx
xx 10 11
21 xx 20

where I’m only indicating the indices, xx indicates elements that don’t exist and then take a column sum to obtain a 3 vector
[00 + 21, 01 + 10, 11 + 20]

Or without the wrap-around, write it as
00 01 xx xx
xx 10 11 xx
xx xx 20 21
and again sum the columns to obtain
[00, 01 + 10, 11 + 20, 21][:-1]

seems almost like a conv1d-like operation or torch.diag/torch.diagonal operation
torch.sum( [ torch.roll( torch.diag( y[ :, i ] ), shifts=i, dims=1 ) for i in range( len( y[ 0 ] ) ], axis=0 ) ?

Thanks

ps: When I had spaces instead of xx in the original post, the spaces were eaten up and the displayed version was not how I typed it

Hi btnadiga!

One approach would be to append (cat()) a column of zeros to your
original tensor, use torch tensor indexing with appropriately constructed
indices to obtain your “shifted” tensor, and then perform the sum:

>>> import torch
>>> torch.__version__
'1.12.0'
>>> t = 10 * torch.arange (10).unsqueeze (1) + torch.arange (2)
>>> t
tensor([[ 0,  1],
        [10, 11],
        [20, 21],
        [30, 31],
        [40, 41],
        [50, 51],
        [60, 61],
        [70, 71],
        [80, 81],
        [90, 91]])
>>> t0 = torch.cat ((t, torch.zeros (10, 1, dtype = torch.long)), 1)
>>> t0
tensor([[ 0,  1,  0],
        [10, 11,  0],
        [20, 21,  0],
        [30, 31,  0],
        [40, 41,  0],
        [50, 51,  0],
        [60, 61,  0],
        [70, 71,  0],
        [80, 81,  0],
        [90, 91,  0]])
>>> t0[torch.arange (10).unsqueeze (1).expand (10, 3), (torch.arange (30).view (10, 3) - torch.arange (10).unsqueeze (1)) % 3]
tensor([[ 0,  1,  0],
        [ 0, 10, 11],
        [21,  0, 20],
        [30, 31,  0],
        [ 0, 40, 41],
        [51,  0, 50],
        [60, 61,  0],
        [ 0, 70, 71],
        [81,  0, 80],
        [90, 91,  0]])
>>> t0[torch.arange (10).unsqueeze (1).expand (10, 3), (torch.arange (30).view (10, 3) - torch.arange (10).unsqueeze (1)) % 3].sum (0)
tensor([333, 304, 273])

Best.

K. Frank

@KFrank, coming from numpy my solution was

import torch
n, m = 3, 2
y2d = torch.arange(6.).reshape(n, m)
print(y2d)
tmp = torch.sum ( torch.stack( [ torch.roll( torch.diag( y2d[:, i] ),
                                             shifts=i, dims=1 )
                                 for i in range(m) ] ), axis=(0, 1) )
print(tmp)

Any idea which is better in the sense of speed?
Thanks!
Balu

Hi Balu!

If efficiency actually matters, I would suggest that you time both versions.

Using loop-free pytorch tensor operations will typically outperform (often
by a lot) an equivalent for-loop / list-comprehension computation. But
that’s just a rule of thumb that may or may not apply to your use case.

Best.

K. Frank