Scatter_add with two index tensor

Dear all,

suppose I have two index[N] and a src[M,N] tensor defined by

M, N, K = 1000, 100, 512
target = torch.zeros(M, N, N)
src = torch.randn(M, K)
index1 = torch.randint(0, N, (N,))
index2 = torch.randint(0, N, (N,))

How could I perform

for i in range(M):
  for j in range(K):
    target[i,index1[j],index2[j]] += src[i,j] 

by using tensor.scatter_add_ or anything to accelerate this calculation as M, K are actually very large in actual calculation

I guess flattening the last two dimensions and creating a new indexing tensor should work. (I’m not at my workstation so cannot post a code snippet showing this approach right now)