Minimum and Mean Euclidean distance between two tensors of different shape

Tanay_Rastogi · September 19, 2023, 1:19pm

I am quite new to Pytorch and currently running into issues with Memory Overflow.

Task:
I have two 2D tensors of respective shapes A: [1000, 14] & B: [100000, 14].

I have to find the distance of each row of tensor-A from all rows from tensor-B. Later using the calculated distance values, I find the mean of minimum/mean distance of each row of tensor-A from tensor-B.

Current Solution:
My solution to calculate minimum distance:

dist = list()
for row_id in range(A.shape[0]):
      # Mean distance of a row in A from B
      dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).min().item())
result = torch.FloatTensor(dist).mean()

And solution to calculate minimum mean distance:

dist = list()
for row_id in range(A.shape[0]):
      # Mean distance of a row in A from B
      dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).mean().item())
result = torch.FloatTensor(dist).mean()

Issue:
This gives me result but is either very slow (if run on CPU) or often leads to memory overflow in GPU when trying to run on GPU. (I have a T4 GPU - 8GB)

Can you please recommend me a better solution to calculate the Euclidean distance that is faster and does not lead to overflow issues?

Thanks!

KFrank · September 19, 2023, 8:51pm

Hi Tanay!

Use torch.cdist().

Consider this script:

import torch
print (torch.__version__)
print (torch.version.cuda)
print (torch.cuda.get_device_name())

_ = torch.manual_seed (2023)

A = torch.randn (1000, 14, device = 'cuda')
B = torch.randn (100000, 14, device = 'cuda') 
print ('A.shape:         ', A.shape)
print ('B.shape:         ', B.shape)

dist = list()
for row_id in range(A.shape[0]):
    # Min distance of a row in A from B
    dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).min().item())

resultMinMean = torch.FloatTensor(dist).mean()

dist = list()
for row_id in range(A.shape[0]):
    # Mean distance of a row in A from B
    dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).mean().item())

resultMeanMean = torch.FloatTensor(dist).mean()

distAB = torch.cdist (A, B)
resultMinMeanB = distAB.min (dim = 1).values.mean()
resultMeanMeanB = distAB.mean()

print ('resultMinMean:   ', resultMinMean)
print ('resultMinMeanB:  ', resultMinMeanB)
print ('allclose? :      ', torch.allclose (resultMinMeanB.cpu(), resultMinMean))

print ('resultMeanMean:  ', resultMeanMean)
print ('resultMeanMeanB: ', resultMeanMeanB)
print ('allclose? :      ', torch.allclose (resultMeanMeanB.cpu(), resultMeanMean))

And its output:

2.0.1
11.8
GeForce GTX 1050 Ti
A.shape:          torch.Size([1000, 14])
B.shape:          torch.Size([100000, 14])
resultMinMean:    tensor(1.8380)
resultMinMeanB:   tensor(1.8380, device='cuda:0')
allclose? :       True
resultMeanMean:   tensor(5.2006)
resultMeanMeanB:  tensor(5.2006, device='cuda:0')
allclose? :       True

Best.

K. Frank

Tanay_Rastogi · September 20, 2023, 11:38am

Thanks a lot! This works like a charm. Also, it gave me a very good speed boost as well to my code. Thanks!