Hi all,
Back in 2017, it was decided that torch.norm would have a zero subgradient at zero (Norm subgradient at 0 by albanD · Pull Request #2775 · pytorch/pytorch · GitHub).
Applying the same logic, shouldn’t torch.hypot have a zero subgradient at (0, 0)?
Currently, torch.hypot gives NaNs in gradient for (0, 0) inputs but is otherwise equivalent to torch.norm of the concatenation/stacking of two tensors.