How to get indices from tensor where the scores (values) satisfies a condition after using torch.topk

I have a Tensor of shape NxN which is basically the similarity or inner product of two tensors. I want to Get all the values which are above any threshold, say 0.5.

The result I’m looking for is something like: For each index i , get all the values which are closer than 0.5. Obviously for each index i, there’ll be a minimum 1 element (self similarity is 1.0) and can be a maximum of N elements (similarities is NxN matrix with the diagonal elements as 1

How could I go this?

x = torch.randn((9052, 512))
similarities = x @ x.T
scores, indices = torch.topk(similarities, x.shape[0]) # topk == shape[0] means get all the values, sorted

I have tried

mask = torch.ones(scores.size()[0])
mask = 1 - mask.diag()
sim_vec = torch.nonzero((scores >= 0.5)*mask)

Gives me a tensor of shape: torch.Size([51147416, 2])

I’ve also tried

(scores > 0.5 ).nonzero()

It too gives me a tensor of shape: torch.Size([51147416, 2])

well that sounds about right… scores is a 9052x9052 matrix, about half of all elements match that condition. Maybe you want something else?

So how do I interpret the results in this case? Only thing I could think of was to loop through each element and save in a list but that’d be time consuming.

I think you could do something like this:

scores_flat = scores.flatten()
idx = (scores_flat > 0.5 ).nonzero()
scores_match = torch.gather(scores, dim=0, index=idx)

So then scores_match would be the scores that match the condition scores > 0.5. Haven’t tried this so may need some experimentation. Hope this is what you want!