Gather values for multiple indices

itay1itzhak · December 28, 2020, 9:44pm

Hi all,
I have a problem performing values gathering for multiple values and would appreciate it if anyone has an idea that can solve it.

So, I have 2 tensors:
tgt - tokens (just numbers from a vocab)
probs - the probs for each corresponded token

I want to know the probability for each unique token.
But, each token can appear more than once in tgt, so we’ll have a few probs we need to aggregate. And we need this to happen for each token.

For example (with no batch dimension for simplicity):

tgt.shape (Batch_size X k)
probs.shape (Batch_size X k)

tgt = [3, 7, 3, 5, 11, 11, 11]
probs = [-0.2, -0.4, -0.5, -0.8, -0.1, -0.7, -0.2]
# We want the summed probs:
summed_probs = [-0.7, -0.4, -0.7, -0.8, -1.0, -1.0, -1.0]
# OR
summed_probs = [(3, -0.7),(7, -0.4), (5, -0.8), (11, -1.0)]

Right now I’m doing it with an iterative code using Numba, but I would like to get a tensor operation solution.

Thanks !

ptrblck · January 6, 2021, 9:29am

If the order of the tgt values doesn’t matter, this would work:

tgt = torch.tensor([3, 7, 3, 5, 11, 11, 11])
probs = torch.tensor([-0.2, -0.4, -0.5, -0.8, -0.1, -0.7, -0.2])

u, u_idx = tgt.unique(return_inverse=True)
ret = torch.zeros(len(u))
ret.scatter_add_(0, u_idx, probs)
print(ret)

and you could create the tuples if needed afterwards.

itay1itzhak · January 12, 2021, 4:00pm

Thanks for your response!
But if I’m not mistaken, this solution is only good for the 1-dimensional case. My example was with 1 dimension, but the problem is regarding the 2 dimensional case.
For example:

tgt = torch.tensor([[3, 7, 3, 5, 11, 11, 11],[1, 1, 1, 4, 11, 15, 4]])
probs = torch.tensor([[-0.2, -0.4, -0.5, -0.8, -0.1, -0.7, -0.2],[-0.1, -0.9, -0.1, -0.4, -0.3, -0.3, -0.8]]) 

summed_probs = [[-0.7, -0.4, -0.7, -0.8, -1.0, -1.0, -1.0],[-1.1, -1.1, -1.1, -1.2, -0.3, -0.3, -1.2]]
# OR
summed_probs = [[(3, -0.7),(7, -0.4), (5, -0.8), (11, -1.0)],[(1, -1.1),(4, -1.2, (11, -0.3), (15, -0.3)]]