Row-wise comparisons between 2D-tensors

Hi everyone!

I’m trying to compare all row-elements of 2 2D tensors. An easy example of would be the following two tensors

a = torch.tensor([[1,2], [4,5], [7,8]])
b = torch.tensor([[2,3], [7,5], [-1,7]])

Now I’d like to check for each element in the first tensor if it is part of the same row in the second tensor. My expected result would be

[
[False, False] (1 vs [2, 3])
[True, False] ((2 vs [2, 3)
[False, False] (4 vs  [7,5])
[False, True] (5 vs  [7,5])
[False, True] (7 vs  [-1,7])
[False, False] (8 vs  [-1,7])
]

Does anyone have any idea how to solve this efficiently?

Thanks a lot!

It certainly isn’t an efficient way memory-wise, but you might check, if it would yield a speed up compute-wise:

a = torch.tensor([[1,2], [4,5], [7,8]])
b = torch.tensor([[2,3], [7,5], [-1,7]])

ret = a.view(-1, 1, 1) == b
idx = torch.arange(3).unsqueeze(1).expand(-1, 2).reshape(-1)
print(ret[torch.arange(ret.size(0)), idx])
> tensor([[False, False],
          [ True, False],
          [False, False],
          [False,  True],
          [False,  True],
          [False, False]])
1 Like

That is very helpful, thank you very much!

Another implementation:

res = a.repeat_interleave(2, dim=1).reshape(-1, 2) == b.repeat_interleave(2, dim=0)
1 Like

Thanks a lot for this, cool to see that there are so many possibilities to solve this problem!

Eta_Cs solution seems to be quite a bit faster for large tensors (shape [10000,2]):

N=10000
a = torch.rand([N,2])
b = torch.rand([N,2])

from timeit import default_timer as timer
start = timer()
idx = torch.arange(N).unsqueeze(1).expand(-1, 2).reshape(-1)
for _ in range(500):
    ret = a.view(-1, 1, 1) == b

    res = ret[torch.arange(ret.size(0)), idx]
end = timer()
print(end - start)

start2 = timer()
for _ in range(500):
    res = a.repeat_interleave(2, dim=1).reshape(-1, 2) == b.repeat_interleave(2, dim=0)
end2 = timer()
print(end2 - start2)

121.2189056
0.11425909999999817

hi, hope you’re doing well
I have 2 tensors with unequal size

a = torch.tensor([[8,2], [5,3],[4,4]])
b = torch.tensor([[1,2],[5,3]])

I want a boolean tensor of whether each value exists in the other tensor without iterating. something like
a in b
and then we should have

[False, True, False]
would you please help me?
thanks in advance

This should work:

a = torch.tensor([[8,2], [5,3],[4,4]])
b = torch.tensor([[1,2],[5,3]])

res = (a.unsqueeze(0) == b.unsqueeze(1)).all(dim=2).any(dim=0)
print(res)
# > tensor([False,  True, False])

The first all(dim=2) operation makes sure that all elements of the rows match while the any(dim=0) operation checks if any of the rows have matches the corresponding row in a.

Thanks a lot…it’s very helpful.

Hi, I was looking for the same thing and came up with a similar solution. However, could this approach cause huge memory consumption if the tensors involved are large? If yes, is there any other possible solution that consumes few memory and does not require the use of loops? Thanks!

Yes, the memory usage could be large since you are broadcasting the tensors and need to calculate the intermediates. Using loops would have a lower memory footprint, but could be slower. Your best bet might be to write a custom C++/CUDA operation for your use case and check if you could get a proper speedup without a large memory requirement.

Hi, happy new year…wish you a happy and healthy year
I have a question, would you please answer me?
I have 2 tensors:
tensor([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,
7, 7, 8, 8, 8, 8, 8, 9, 9, 10, 10, 10, 11, 12, 12, 13, 13, 13,
13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21,
21, 22, 22, 23, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 27, 27,
27, 27, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31,
31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33])
and
tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 17, 19, 21, 31, 0, 2,
3, 7, 13, 17, 19, 21, 30, 0, 1, 3, 7, 8, 9, 13, 27, 28, 32, 0,
1, 2, 7, 12, 13, 0, 6, 10, 0, 6, 10, 16, 0, 4, 5, 16, 0, 1,
2, 3, 0, 2, 30, 32, 33, 2, 33, 0, 4, 5, 0, 0, 3, 0, 1, 2,
3, 33, 32, 33, 32, 33, 5, 6, 0, 1, 32, 33, 0, 1, 33, 32, 33, 0,
1, 32, 33, 25, 27, 29, 32, 33, 25, 27, 31, 23, 24, 31, 29, 33, 2, 23,
24, 33, 2, 31, 33, 23, 26, 32, 33, 1, 8, 32, 33, 0, 24, 25, 28, 32,
33, 2, 8, 14, 15, 18, 20, 22, 23, 29, 30, 31, 33, 8, 9, 13, 14, 15,
18, 19, 20, 22, 23, 26, 27, 28, 29, 30, 31, 32])
and I have one more tensor which name is “a” and has the size of 34*34.
I wanna access to some, but not all, elements of “a” based on the two previous tensors…
for example I need a[0][1], a[0][2] , a[0][3], a[0][4], a[0][5], a[0][6], a[0][7], a[0][8] but I don’t need a[0][9] because 9 is not in the second tensor and again I need a[1][2], a[1][3] , a[1][7] but I don’t need a[1][4] because 4 is not in the second tensor…
thanks in advance

Direct indexing should work:

x = torch.tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
                  1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3,
                  3, 3, 3, 3, 3, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 7, 7,
                  7, 7, 8, 8, 8, 8, 8, 9, 9, 10, 10, 10, 11, 12, 12, 13, 13, 13,
                  13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 19, 19, 19, 20, 20, 21,
                  21, 22, 22, 23, 23, 23, 23, 23, 24, 24, 24, 25, 25, 25, 26, 26, 27, 27,
                  27, 27, 28, 28, 28, 29, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 31, 31,
                  31, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 33, 33, 33, 33, 33,
                  33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33])

y = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13, 17, 19, 21, 31, 0, 2,
                  3, 7, 13, 17, 19, 21, 30, 0, 1, 3, 7, 8, 9, 13, 27, 28, 32, 0,
                  1, 2, 7, 12, 13, 0, 6, 10, 0, 6, 10, 16, 0, 4, 5, 16, 0, 1,
                  2, 3, 0, 2, 30, 32, 33, 2, 33, 0, 4, 5, 0, 0, 3, 0, 1, 2,
                  3, 33, 32, 33, 32, 33, 5, 6, 0, 1, 32, 33, 0, 1, 33, 32, 33, 0,
                  1, 32, 33, 25, 27, 29, 32, 33, 25, 27, 31, 23, 24, 31, 29, 33, 2, 23,
                  24, 33, 2, 31, 33, 23, 26, 32, 33, 1, 8, 32, 33, 0, 24, 25, 28, 32,
                  33, 2, 8, 14, 15, 18, 20, 22, 23, 29, 30, 31, 33, 8, 9, 13, 14, 15,
                  18, 19, 20, 22, 23, 26, 27, 28, 29, 30, 31, 32])

a = torch.randn(34, 34)
ret = a[x, y]

reference = []
for x_, y_ in zip(x, y):
    reference.append(a[x_, y_])
reference = torch.stack(reference)

print((ret == reference).all())
# > tensor(True)

Hi, Thank you…it’s very helpful

Hi, hope you’re doing well…
I have a datasets and split it in to train_mask and test_mask…

from sklearn.model_selection import train_test_split
train_mask, test_mask= train_test_split(x, test_size=0.33, random_state = 0, shuffle = True)

then I 've used

train_mask = (x.unsqueeze(0) == train_mask.unsqueeze(1)).all(dim=2).any(dim=0)
test_mask = (x.unsqueeze(0) == test_mask.unsqueeze(1)).all(dim=2).any(dim=0)

to make it usable for PyG. After splitting I have torch.Size([33, 200])
torch.Size([17, 200]) for train_mask and test_mask but after using the above code it gives me 47 and 35 trues for train-mask and test_mask.
Where am I making mistake?Would you please help me?
x is a tensor of torch.Size([50, 200]) I’ve saved it as a pt file but because of limitation I cannot load it here.
thanks in advance

Does your dataset contain duplicates?
If so, it would be expected that your check would yield a larger number of the masks after the split, since both datasets can not contain the duplicated tensors.
You should be able to check it via x.sum(dim=1).unique().size().

yes, my datasets contain duplicate…this is because I simulate data and don’t have real datasets…I have to find a way to have non identical value in my datasets…thanks very much for your prompt response.