Find 10 Nearest neighbors for each point on a pointcloud that comprise 6M points

miguelG_97 · March 5, 2022, 2:21am

Greeting to all, I need to verify that the distance between data points meets some insight requirements. The pandas and numpy synergy takes roughly 20min for a set of 10000 points on a CPU device. Now that I 've deployed pytorch library using google colab GPU the amount of time has not been reduced at all, instead it’s taking much time than the former approach. I’m attaching the script I’m currently using. Any thoughts on how to improve the performance?

import pandas as pd

import numpy as np

import torch

determine the supported device

def get_device():

if torch.cuda.is_available():

    device = torch.device('cuda')

else:

    device = torch.device('cpu') # don't have GPU 

return device

a). read the text file

pointCloud=pd.read_csv(r’/content/500K POINTS.txt’,index_col=0)

b).ordering the data set pertaining to its X values

newdfSorted=pointCloud.sort_values(by=[‘Ycoord’], ascending=True,ignore_index=True)

newdfSorted=newdfSorted.iloc[39900:50000]

c). get the total number of rows

rows,colums=newdfSorted.shape

d). from df to torch tensor

myDevice=get_device()

dfTensor=torch.from_numpy(newdfSorted[[‘Xcoord’,‘Ycoord’,‘Zcoord’]].values).to(device=myDevice,dtype=torch.float)

h.1) creating a new empty torch for further usage

distanceTensor=torch.empty((rows,10),dtype=torch.float,device=myDevice)

for i in range(0,rows,1):

e). Pick a tensor row (a point data)

testPoint=dfTensor[i]

f). generate a new tensor without the previously selected point on (c) and that contains 1000 points ahead the test point index and 1000 points behind it

if i-1000<= 0:

df1=dfTensor[(i-1000):] 

df2=dfTensor[0:i]

df3=dfTensor[i+1:i+1000]#i+1 will exclude the current point value to avoid a zero distance value

newDataFrame=torch.cat([df1,df2,df3],dim=0).to(device=myDevice)

else:

df1=dfTensor[(i-1000):i]

df2=dfTensor[i+1:i+1000]#i+1 will exclude the current point value to avoid a zero distance value

newDataFrame=torch.cat([df1,df2],dim=0).to(device=myDevice)

g). determine the near 10 closest points to the “i” loop point

distance1=torch.norm(testPoint-newDataFrame[0],p=‘fro’)

firstV=torch.tensor([distance1*100]).to(device=myDevice)

for j in range(1,newDataFrame.size(dim=0),1):

distance=torch.norm(testPoint-newDataFrame[j],p='fro').to(device=myDevice)

firstV=torch.cat([firstV,torch.tensor([distance*100]).to(device=myDevice)]).to(device=myDevice)

values,indices=torch.sort(firstV,dim=0,descending=False)

nearDistances=values[0:10]

h.2)Attach the 10 closest neighbour to the created tensor on (h.1)

distanceTensor[i]=nearDistances

i)Join original tensor from txtfile and the one that comprise the closest distance algon the horizontal axis

torch.concat([dfTensor,distanceTensor],dim=1).to(device=myDevice)