Hey all! I’ve been experimenting with different Minkowski distances (L1, L2, etc.) as loss functions for a project, and one thing I wanted to try was <L1 as a metric. So, instead of the distance being |(x-y)| for L1, I’d have, for example, sqrt(|(x-y)|). However, my problem is that when I take the difference of the arrays x and y, the resulting values are <1, which means the square root actually just causes those values to blow up. I can train with distance metrics such as |(x-y)| or |(x-y)|^2, but if I try sqrt(|(x-y)|), then the loss goes to NaN very quickly and no training happens. Is it simply not possible to train because of the nature of the square root blowing up the difference values, or is there some way to stabilize it somehow?

Thanks in advance!