Hi all,

I have a dot product which I aim to optimise by using some a priori information.

I have two matrices, i.e. Q (n x d) and K (d x n), and since n can be very large this results in quadratic complexity. But I also have a symmetric indicator matrix A (n x n) where each row has knn ones and n-knn zeros (knn < n, always). I use this matrix A to select the relevant columns in K for each row in Q. So now, my question is, how can I efficiently implement the outlined code part below? And, how can I store A as efficient as possible in terms of memory?

```
import torch
import time
import random
from sklearn.neighbors import kneighbors_graph
n = 1024
d = 512
knn = 64
q = torch.randn(n, d)
k = torch.randn(d, n)
coords = torch.tensor([(random.random(), random.random()) for _ in range(n)])
A = torch.tensor(kneighbors_graph(coords, knn, mode=‘connectivity’, include_self=False).todense()).bool()
t1_start = time.time()
#start optimise; goal: this should be more efficient than full dot product
outs = [q[i].matmul(k[:,A[i]]) for i in range(n)]
outs = torch.stack(outs)
#end optimise
t1_stop = time.time()
time_need = t1_stop - t1_start
print(time_need) # 0.1741
# this is the full dot product
t1_start = time.time()
outs=q.matmul(k)
t1_stop = time.time()
time_need = t1_stop - t1_start
print(time_need) #0.00365
```