Custom mean of tensor partitions

mamadpierre · July 13, 2021, 6:53pm

I have a tensor of size (n1, n2, n3) and a lst of size n2 with integer values in [0, 1, ..., n2-1] as partition (or cluster) assignments. What I want is an output of size (n1, len(set(lst)), n3) where the second elements are the mean of those belong to the same partition (cluster). For instance, imagine we have tensor a of size (50, 5, 16) and we have lst = [0, 1, 1, 1, 2]. I want an output b of size (50,3,16) where b[:, 0, :] = a[:, 0, :], b[:, 1, :] = mean of (a[:, 1, :], a[:, 2, :], a[:, 3, :]) and b[:, 2, :] = a[:, 4, :].

What would be an efficient way of implementing this in PyTorch? I appreciate your time.

tom · July 13, 2021, 7:10pm

I’d look at the third-party package PyTorch scatter. It has a reduction=mean mode.
You need to convert lst to a tensor and possibly use broadcasting.
Now, the scatter implementation uses atomics, which is problematic e.g. in terms of performance. If the partitions are ordered (as your example suggests), you might compare to just doing a for loop and taking means over the slices.

Best regards

Thomas