Greeting to all, I need to verify that the distance between data points meets some insight requirements. The pandas and numpy synergy takes roughly 20min for a set of 10000 points on a CPU device. Now that I 've deployed pytorch library using google colab GPU the amount of time has not been reduced at all, instead it’s taking much time than the former approach. I’m attaching the script I’m currently using. Any thoughts on how to improve the performance?
import pandas as pd
import numpy as np
import torch
determine the supported device
def get_device():
if torch.cuda.is_available():
device = torch.device('cuda')
else:
device = torch.device('cpu') # don't have GPU
return device
a). read the text file
pointCloud=pd.read_csv(r’/content/500K POINTS.txt’,index_col=0)
b).ordering the data set pertaining to its X values
newdfSorted=pointCloud.sort_values(by=[‘Ycoord’], ascending=True,ignore_index=True)
newdfSorted=newdfSorted.iloc[39900:50000]
c). get the total number of rows
rows,colums=newdfSorted.shape
d). from df to torch tensor
myDevice=get_device()
dfTensor=torch.from_numpy(newdfSorted[[‘Xcoord’,‘Ycoord’,‘Zcoord’]].values).to(device=myDevice,dtype=torch.float)
h.1) creating a new empty torch for further usage
distanceTensor=torch.empty((rows,10),dtype=torch.float,device=myDevice)
for i in range(0,rows,1):
e). Pick a tensor row (a point data)
testPoint=dfTensor[i]
f). generate a new tensor without the previously selected point on (c) and that contains 1000 points ahead the test point index and 1000 points behind it
if i-1000<= 0:
df1=dfTensor[(i-1000):]
df2=dfTensor[0:i]
df3=dfTensor[i+1:i+1000]#i+1 will exclude the current point value to avoid a zero distance value
newDataFrame=torch.cat([df1,df2,df3],dim=0).to(device=myDevice)
else:
df1=dfTensor[(i-1000):i]
df2=dfTensor[i+1:i+1000]#i+1 will exclude the current point value to avoid a zero distance value
newDataFrame=torch.cat([df1,df2],dim=0).to(device=myDevice)
g). determine the near 10 closest points to the “i” loop point
distance1=torch.norm(testPoint-newDataFrame[0],p=‘fro’)
firstV=torch.tensor([distance1*100]).to(device=myDevice)
for j in range(1,newDataFrame.size(dim=0),1):
distance=torch.norm(testPoint-newDataFrame[j],p='fro').to(device=myDevice)
firstV=torch.cat([firstV,torch.tensor([distance*100]).to(device=myDevice)]).to(device=myDevice)
values,indices=torch.sort(firstV,dim=0,descending=False)
nearDistances=values[0:10]
h.2)Attach the 10 closest neighbour to the created tensor on (h.1)
distanceTensor[i]=nearDistances
i)Join original tensor from txtfile and the one that comprise the closest distance algon the horizontal axis
torch.concat([dfTensor,distanceTensor],dim=1).to(device=myDevice)