Hi there,
if have a data matrix x of dim n x d, where n is the batch dimension and d is the dimensionality of my data and a matrix m in d x m.
I want to use m to compute averages over certain dimensions: e.g. x @ m => in n x m.
Therefor m[i,j] = 1.0/i_z for j in [idx_i_1,idx_i_2,…,idx_i_z], 0 else
As one can see the internal dimension of m is actually d+1, where the parameter 1.0/i_z repeats i_z times in dimension i and the matrix is 0 everywhere else.
Is there a clever way to produce such a matrix using views?
Could you post a (slow) reference code using nested loops, please?
Gladly:
#nested loops
indices = [[0,1],[2]]
x = torch.tensor([[1,2,3],[4,5,6],[7,8,9]])
res = torch.zeros((3,2))
for i in range(3):
for j in range(2):
res[i,j] = torch.sum(x[i,indices[j]]) / len(indices[j])
print(res)
#with matrix as previously described
avg_mat = torch.zeros((2,3))
counter = 0
for i, idx in enumerate(indices):
avg_mat[i,idx] = 1.0/len(idx)
print(x.to(torch.float) @ avg_mat.T)
print(avg_mat.T)
—>
tensor([[1.5000, 3.0000],
[4.5000, 6.0000],
[7.5000, 9.0000]])
tensor([[1.5000, 3.0000],
[4.5000, 6.0000],
[7.5000, 9.0000]])
tensor([[0.5000, 0.0000],
[0.5000, 0.0000],
[0.0000, 1.0000]])
I found out about torch.scatter, maybe thats a good solution to my troubles…
Yes, .scatter_reduce_
should work, but you would need to modify your index and check the count:
idx = torch.tensor(sum([[i]*len(a) for i, a in enumerate(indices)], []))
_, count = idx.unique(return_counts=True)
idx = idx.unsqueeze(0).expand(3, 3)
out = torch.zeros(idx.size(0), idx.max()+1, dtype=x.dtype)
out.scatter_reduce_(1, idx, x, reduce="sum")
out = out / count
print(out)
# tensor([[1.5000, 3.0000],
# [4.5000, 6.0000],
# [7.5000, 9.0000]])
1 Like
thank you very much!