How to optimally implement pairwise custom kernel operation?

Hi,

Given 1D differentiable vectors A=[Nx1] and B=[Mx1], I am looking to compute pairwise kernel operation .

def kernel(a, b):
    a*b*torch.exp(-torch.abs(a-b)/0.4)

Is there a way to avoid looping over individual items? I need to perform the kernel operation for pairwise entries in {A, A}, {A, B} and {B, B} which might make it computationally heavy if done iteratively.

You could try to use broadcasting as seen here:

a = torch.arange(4).float().view(4, 1)
b = torch.arange(4).float().view(4, 1)

# element-wise
print(a - b)

# pair-wise
print(a.unsqueeze(1) - b)

which would result in a higher memory footprint, but might be faster than your sequential approach.