Indexing with repeating indices (numpy.add.at)

trypag · November 21, 2017, 4:54pm

Hi,
Simple problem, which is already solved in numpy with add.at operator. I want to index with repeating indices over two dimensions. A simple 1D example :

l=torch.ones(5)
i = torch.LongTensor([1, 2, 2, 3])
l[i] += 1
print(l)
Out[52]: 

 1
 2
 2 # should be 3
 2
 1
[torch.FloatTensor of size 5]

If you need to think of a real use case :

# l_ij = 1 with i from m1 and j from m2
l = torch.autograd.Variable(torch.LongTensor(10, 10))
m1 = torch.LongTensor(1, 255, 255).random_(0, 10)
m2 = torch.LongTensor(1, 255, 255).random_(0, 10)
l[m1, m2] = 1

This kind of indexing does not work because there are repeated indices in both index tensors. @Soumith_Chintala pointed me to index_add_ which seems in fact to solve part of the problem with indexing, however I can’t think of a solution comparable to the one of numpy https://stackoverflow.com/questions/46114340/numpy-advanced-indexing-same-index-used-multiple-times-in.

Thanks

trypag · November 21, 2017, 5:55pm

I don’t think using index_add_ is possible is this case, it uses in place op which is no good on Variable.

richard · November 21, 2017, 5:56pm

For your first example, you could do something like the following:

l = torch.ones(5)
i = torch.LongTensor([1, 2, 2, 3])
add_this = i.float().histc(bins=5, min=0, max=4)
l += add_this

trypag · November 21, 2017, 6:07pm

Thank you @richard, the 1D example was used to demonstrate the issue, I understand your thinking however I am not sure it could be applied on a 2D example with indexing coming from two separate tensors

richard · November 21, 2017, 6:54pm

Could you give an example of what you’d want from a 2D example?

trypag · November 21, 2017, 9:23pm

Sure, let’s say you have a first index tensor i containing indices between 0 and 5 :

Variable containing:
(0 ,.,.) = 
  4  4  3  0
  3  4  2  3
  2  3  1  1
  1  4  3  1
[torch.LongTensor of size 1x4x4]

and a second index tensor j of same size:

Variable containing:
(0 ,.,.) = 
  1  1  1  1
  1  1  1  1
  1  1  1  1
  1  1  1  1
[torch.LongTensor of size 1x4x4]

finally a matrix G of size 5x5 initially set to 0.

G = torch.LongTensor(5,5).zero_()

The indexing would proceed like this, pair each index from tensor i and j and increment the G matrix at each index pair.

G[i,j] += 1

For example for the first row of i and j:

G[1, 4] += 1
G[1, 4] += 1
G[1, 3] += 1
G[1, 0] += 1

The current indexing only increases G[1, 4] by 1 instead of 2.

trypag · November 22, 2017, 4:53pm

Hey,
Anyone has an idea about this example of indexing with repeated index ? Thanks !

trypag · November 22, 2017, 8:28pm

I think it’s time to open an issue !! I will post the solution here, if there is any

tom · November 22, 2017, 8:50pm

The sparse trick probably does not work because you need gradients, right?
That might be as easy as writing a autograd function for dense + sparse or even just coalese…

Best regards

Thomas

trypag · November 22, 2017, 8:56pm

Yes I need gradients, I wanted to use indexing for code simplicity and speed. If there is no ‘easy’ solution I will prefer increasing the complexity than investing too much time in the implementation. Thanks for your help @tom

colesbury · November 23, 2017, 6:18am

On the latest version of PyTorch you can use Tensor.put_ with accumulate=True:

http://pytorch.org/docs/master/tensors.html#torch.Tensor.put_

You will need to translate your indices into linear indexes though. Most other in-place indexing functions (like index_add_) have undefined behavior for duplicate indices.

For example:

l = torch.autograd.Variable(torch.LongTensor(10, 10).zero_())
m1 = torch.LongTensor(1, 255, 255).random_(0, 10)
m2 = torch.LongTensor(1, 255, 255).random_(0, 10)
# compute linear index
m3 = torch.autograd.Variable(m1 * 10 + m2)  
# make ones the same size as index
values = torch.autograd.Variable(torch.LongTensor([1])).expand_as(m3)
l.put_(m3, values, accumulate=True)

trypag · November 23, 2017, 7:43am

This feature is coming at the perfect timing, thank you @colesbury can’t wait to try this !

moi90 · December 30, 2020, 8:39am

Is there any chance that put_ will someday support multidimensional tensors? I need to update class centroids with n-dimensional feature vectors.

moi90 · December 30, 2020, 8:47am

index_add_ seems to work now with duplicated indices. Can this be relied upon?

ptrblck · January 8, 2021, 11:55pm

I don’t see any update in the docs regarding duplicate indices, so I would refer to @colesbury’s comment, that it’s undefined behavior, or how did you check it?