RuntimeError: derivative for aten::heaviside is not implemented

Hi Folks,

I was trying to implement a piecewise smooth function with torch.heaviside(). The pytorch send me an error message saying heaviside does not have a derivative.

I know that mathematically the derivative at the discontinuity of the heaviside function should be considered as infinity but is there any version of Pytorch that would produce something that works similarly?



Hi Daqian!

You are correct that the derivative for heaviside() is not implemented.*

However, if I understand your use case correctly, you do not need to
backpropagate through heaviside() itself, rather, just through the pieces
of your piecewise function.

There are a number of ways to do this, but you can do it with heaviside()
if you protect the call to heaviside() with a torch.no_grad() block.

Here is a script that illustrates backpropagating through a piecewise
function built using heaviside():

import torch
print (torch.__version__)

def  square_cube (x):   # a piecewise-smooth function
    with torch.no_grad():
        heavy = torch.heaviside (x, torch.tensor ([0.5]))
    return  (1.0 - heavy) * x**2  +  heavy * x**3

t = torch.arange (-3.0, 3.1, 1.0, requires_grad = True)
print (t)
sc = square_cube (t)
print (t.grad)

And here is its output:

tensor([-3., -2., -1.,  0.,  1.,  2.,  3.], requires_grad=True)
tensor([-6., -4., -2.,  0.,  3., 12., 27.])

*) heaviside() is differentiable, with derivative zero everywhere except
for its discontinuity. So it’s arguable that pytorch should implement its
derivative. However, with zero derivative, backpropagation through
heaviside() isn’t really useful for anything, so not implementing its
derivative isn’t an unreasonable choice. It does break your use case,
but the torch.no_grad() work-around seems adequate.


K. Frank

Hi Frank,

I appreciate your insight!

It is my fault that I did not specify what I wanted. So I was actually trying to model a piecewise smooth function and take a derivative of the function I am modeling. I think maybe approximating the torch.heaviside() with a very steep torch.sigmoid() is a better option for me since I need the derivative (and even the second derivative) of the function I am modeling. In this case, I realized that even if it is implemented, the delta function will still cause problems because of my need of its second derivative.

Still, I appreciate your insight!