You have a step function, so its derivative is 0 almost everywhere
and undefined at the “interesting” point where the step takes place.
What would you want the gradients to be? How would you use such
gradients in a gradient-descent optimization?
For backpropagation / gradient descent to work, your functions need
to be usefully differentiable. The typical approach in cases where you
“want” a step function is to use a differentiable “soft” approximation to
the step function such as sigmoid() or tanh().