# How to efficiently assign features from a 3x3 window in another matrix

Hello,

I have a point cloud - node having their own coordinates (x, y). So, for instance we have a point cloud:

``````X =    [[1,1],
[2,2],
[3,4]]
``````

Then, I have a large matrix that has some features, let’s say a matrix 8,8

``````M =     [[1,2,3,2,5,6,7,8],
[4,5,6,1,8,1,2,4],
[4,5,H,3,8,1,2,4],
[1,5,4,4,1,1,2,3],
[2,5,2,7,1,1,4,3],
[3,5,4,8,5,1,4,3],
[4,4,3,6,4,1,1,3],
[4,4,6,5,8,1,3,5]]
``````

From the task I know that features around the coordinates are important for the point cloud (let’s say the “kernel size is 3” and i want to get flattened features from those positions. So, for point [2,2] we’d get 5,6,1,5,3,5,4. - Then I want to concatenate them to point cloud features - so for point [2,2] we’d get [2, 2, 5, 6, 1, 5, 3, 5, 4].

I know probably how to do that via CPU, but is there any good algorithm how to do that efficiently? It would be done on-the-fly during training on GPU.

thanks!

I think this achieves the spirit of what you’re trying to accomplish but some tweaks might be needed depending on how you want to handle the `H`/center of each patch, etc…

``````import torch

X =    [[1,1],
[2,2],
[3,4]]

M =     [[1,2,3,2,5,6,7,8],
[4,5,6,1,8,1,2,4],
[4,5,0,3,8,1,2,4],
[1,5,4,4,1,1,2,3],
[2,5,2,7,1,1,4,3],
[3,5,4,8,5,1,4,3],
[4,4,3,6,4,1,1,3],
[4,4,6,5,8,1,3,5]]

X = torch.tensor(X, device='cuda', dtype=torch.float)
M = torch.tensor(M, device='cuda', dtype=torch.float)

M2 = unfold(M.reshape(1, 1, M.size(0), M.size(1)))

X_flat = X[:,0]*M.size(1) + X[:,1]
M2 = M2.permute(0, 2, 1)
out = M2[:,X_flat.long(),:]
print(out.reshape(X.size(0), -1))
``````
``````tensor([[1., 2., 3., 4., 5., 6., 4., 5., 0.],
[5., 6., 1., 5., 0., 3., 5., 4., 4.],
[3., 8., 1., 4., 1., 1., 7., 1., 1.]], device='cuda:0')
``````

Thank you this is exactly what I wanted. I cannot get my head around the unfold operation… By the way, is this easily changed into M with more channels? (concatenation even over channels).

Thanks.

There should be no restriction on the number of channels as unfold is basically doing a “convolution” without the reduction.