I am quite new to Pytorch and currently running into issues with Memory Overflow.
Task:
I have two 2D tensors of respective shapes A: [1000, 14] & B: [100000, 14].
I have to find the distance of each row of tensor-A from all rows from tensor-B. Later using the calculated distance values, I find the mean of minimum/mean distance of each row of tensor-A from tensor-B.
Current Solution:
My solution to calculate minimum distance:
dist = list()
for row_id in range(A.shape[0]):
# Mean distance of a row in A from B
dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).min().item())
result = torch.FloatTensor(dist).mean()
And solution to calculate minimum mean distance:
dist = list()
for row_id in range(A.shape[0]):
# Mean distance of a row in A from B
dist.append(torch.linalg.norm(A[row_id, :] - B, dim=1).mean().item())
result = torch.FloatTensor(dist).mean()
Issue:
This gives me result but is either very slow (if run on CPU) or often leads to memory overflow in GPU when trying to run on GPU. (I have a T4 GPU - 8GB)
Can you please recommend me a better solution to calculate the Euclidean distance that is faster and does not lead to overflow issues?