Fast L1 all-to-all distance

Attention mechanism is using a dot product, which is fast (tensor1 @ tensor2) multiplication. Is there also a fast L1 distance (tensor1 L1 tensor2) function? // L1 is just a subtraction and I am surprised that I can not find an implementation faster than multiplication…

Anyone?.. (and some more characters, to reach 20)