I’ve also been trying to figure it out lately, and I think I’ve found a good solution. And it works without numpy, so no need to transfer your tensors to cpu (which is really slower if while training models). Check this out

>>> import torch
>>> a = torch.tensor([1, 2, 3, 6]).cuda()
>>> b = torch.tensor([0, 2, 3, 7]).cuda()
>>> intersection = (a * (a == b).float()).nonzero().flatten()
>>> intersection
tensor([1, 2], device='cuda:0')

Not sure if this would help, as the code hoovers over all t2 elements in a for-loop. Hence, would not benefit from the GPU. In fact, numpy intersect is much faster.

def tensor_intersect(t1, t2):
t1=t1.cuda()
t2=t2.cuda()
indices = torch.zeros_like(t1, dtype = torch.bool, device = 'cuda')
for elem in t2:
indices = indices | (t1 == elem)
intersection = t1[indices]
return intersection

Here’s a tweak that’s 15 to 20 times faster than numpy intersect (for large sets). For small sets, it is better to work with numpy, until someone writes a better torch algorithm based on search:

Thanks for sharing. Your solution could even expand to N-dim vectors, e.g., 3D coordinates. In addition, the index of the intersection could also be acquired in this way, if the return_inverse is set to True in the torch.unique function.