Torch.round() gradient

Both functions are differentiable almost everywhere. Think about what the functions look like.

round() is a step function so it has derivative zero almost everywhere. Although it’s differentiable (almost everywhere), it’s not useful for learning because of the zero gradient.

clamp() is linear, with slope 1, inside (min, max) and flat outside of the range. This means the derivative is 1 inside (min, max) and zero outside. It can be useful for learning as long as enough of the input is inside the range.

14 Likes