L2 distance in pixel space

I’m trying to create a loss function that measures the euclidean distance of points (not based on coordinates but) based on activated pixels in a 2D map. So, for example:

a = torch.zeros(1, 1, 5, 5) # (B,C,H,W)
b = torch.zeros(1, 1, 5, 5) # (B,C,H,W)
a[0,0,1,1] = 1.0
b[0,0,3,4] = 1.0
loss(a, b)
# Expected Output: tensor([3.60555])

I can easily do this calculation with a combination of nonzero() and pow(2).sum().sqrt(). However, I don’t think that I can backpropagate anything, right?

How could I set up this loss function to train a network that minimizes the l2 loss between activated pixels to encourage predictions that overlap with the ground truth binary mask?

There will always be exactly two pixels that are activated?

No, but I can “calculate” the assignment in pairs that im interested in.

Deleted, better answer below.

How about converting your pixel representation into coordinates as a first step, and then taking the distance. I believe this should work:

a = torch.zeros(1, 1, 5, 5) # (B,C,H,W)
b = torch.zeros(1, 1, 5, 5) # (B,C,H,W)

a[0, 0, 1, 1] = 1.
b[0, 0, 3, 4] = 1

x_a = (torch.arange(5).float() @ a).sum()
y_a = (torch.arange(5).float() @ a.permute([0, 1, 3, 2])).sum()

x_b = (torch.arange(5).float() @ b).sum()
y_b = (torch.arange(5).float() @ b.permute([0, 1, 3, 2])).sum()

((x_b - x_a).pow(2) + (y_b - y_a).pow(2)).sqrt()

Output:
tensor(3.6056)

thank you. this looks good!