assume I have a tensor, and I want to apply a certain function to buckets of its elements. For example, assume I have the tensor

>>> A
0
2
0
2
[torch.FloatTensor of size 4x1]

and I want to compute the mean for every “bucket” of two elements, and replace it, like so:

>>> for idx in range(0, 4, 2):
A[idx:idx+2] = torch.mean(A[idx:idx+2] )
>>> A
1
1
1
1
[torch.FloatTensor of size 4x1]

The issue is that for loop may be very slow, since it has to execute A.numel()/2 times. Is there any way to make it parallelizable, so that it runs in parallel over multiple buckets at the same time?

Note: The example with the mean is just to clarify what I meant, the actual function is slower and more complicated.

@smth@antspy I’m sorry I don’t understand the answer. I could bucket the tensor using unfold or view, but how do I apply the function (mean in this case) in parallel?