I’m trying to create a loss function that measures the euclidean distance of points (not based on coordinates but) based on activated pixels in a 2D map. So, for example:

a = torch.zeros(1, 1, 5, 5) # (B,C,H,W)
b = torch.zeros(1, 1, 5, 5) # (B,C,H,W)
a[0,0,1,1] = 1.0
b[0,0,3,4] = 1.0
loss(a, b)
# Expected Output: tensor([3.60555])

I can easily do this calculation with a combination of nonzero() and pow(2).sum().sqrt(). However, I don’t think that I can backpropagate anything, right?

How could I set up this loss function to train a network that minimizes the l2 loss between activated pixels to encourage predictions that overlap with the ground truth binary mask?