Efficient way to get index of values from one tensor in another

I have two very large tensors a and b, each has about one million values.
Now I want to see if the elements in b exist in a, and if yes, return their indices in a. I know the easiest way to implement is

exist_elements_id, index_in_a = b.view(-1, 1).eq(a).nonzero(as_tuple=True)

However, this will result in cuda out of memory (1TB memory) issue because the eq() result is too large.

Is there any other memory-efficient and fast way to implement this? Thanks!

1)concatenate tensors into tensor C
2)stable sort C on CPU, keep indices
3)do torch.unique_consecutive, return counts & indices
4)select indices where count is 2
5)map indices back to indices in C, due to stable sort they should only point to the first tensor (a)

disclaimer: I haven’t done this in practice