Efficient way of calculating sum of unequal sized chunks of tensor

I am wondering if there is a way of calculating the following sum of unequal sized chunks of a tensor.

import torch
import numpy as np

x = torch.rand(1000,100)
y = np.unique(np.random.choice(1000,10)

here I have a tensor x of size (1000,10), I want to calculated the sum of chucks along the first axis. These chunks are split along the first axis and y indicate the end line of each chunk. They are in general of unequal size. For example, I can do this with the following for loop

cum_pos_lik = torch.FloatTensor(y.size, 100)
y = np.append(0, y)
for i in range(1, y.size):
    cum_pos_lik[i-1, :] = x[y[i-1]:y[i], :].sum(0)

But I need this to be faster for my application. Clearly the sum of each chunk can be parallelized. I am wondering if there is a simple way in pytorch of doing it.


If you want to do it on the cpu, you might want to look into torch.multiprocessing, but Iā€™m not sure there is a better way using only tensor operations. A naive approach padding each subtensor with 0s so that they have the same size would possibly be slower.

Thanks for your reply. I am actually doing this on GPU. I think the only way is to pad it and do sum along axis. I would assume if I am doing for loop anyway, only do padding with in the for loop would be faster? In any case I will try it out. Thanks a lot!

You can try the following

sums = x.cumsum(0)
sums = sums[y,:]
nrows = y.size
index = torch.LongTensor(range(1,nrows))
sums0.index_copy_(0,index, sums[range(nrows-1),:])
cum_pos_lik = sums-sums0
1 Like

Hi ngimel,

Thank you for your reply! However I have tried something similar before, the problem is for my real data size, cumsum is actually the bottleneck.

Hi @atanaka7, did you finda solution to this problem. I am in the same situation.