Computing averages

Hi there,
if have a data matrix x of dim n x d, where n is the batch dimension and d is the dimensionality of my data and a matrix m in d x m.
I want to use m to compute averages over certain dimensions: e.g. x @ m => in n x m.
Therefor m[i,j] = 1.0/i_z for j in [idx_i_1,idx_i_2,…,idx_i_z], 0 else
As one can see the internal dimension of m is actually d+1, where the parameter 1.0/i_z repeats i_z times in dimension i and the matrix is 0 everywhere else.
Is there a clever way to produce such a matrix using views?

Could you post a (slow) reference code using nested loops, please?

Gladly:

    #nested loops
    indices = [[0,1],[2]]
    x = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])
    res = torch.zeros((3,2))
    for i in range(3):
        for j in range(2):
            res[i,j] = torch.sum(x[i,indices[j]]) / len(indices[j])
    print(res)

    #with matrix as previously described
    avg_mat = torch.zeros((2,3))
    counter = 0
    for i, idx in enumerate(indices):
        avg_mat[i,idx] = 1.0/len(idx)
    print(x.to(torch.float) @ avg_mat.T)
    print(avg_mat.T)

—>

tensor([[1.5000, 3.0000],
        [4.5000, 6.0000],
        [7.5000, 9.0000]])
tensor([[1.5000, 3.0000],
        [4.5000, 6.0000],
        [7.5000, 9.0000]])
tensor([[0.5000, 0.0000],
        [0.5000, 0.0000],
        [0.0000, 1.0000]])

I found out about torch.scatter, maybe thats a good solution to my troubles…

Yes, .scatter_reduce_ should work, but you would need to modify your index and check the count:

idx = torch.tensor(sum([[i]*len(a) for i, a in enumerate(indices)], []))
_, count = idx.unique(return_counts=True)
idx = idx.unsqueeze(0).expand(3, 3)

out = torch.zeros(idx.size(0), idx.max()+1, dtype=x.dtype)
out.scatter_reduce_(1, idx, x, reduce="sum")
out = out / count

print(out)
# tensor([[1.5000, 3.0000],
#         [4.5000, 6.0000],
#         [7.5000, 9.0000]])
1 Like

thank you very much!