Hi Daqian!
You are correct that the derivative for heaviside()
is not implemented.*
However, if I understand your use case correctly, you do not need to
backpropagate through heaviside()
itself, rather, just through the pieces
of your piecewise function.
There are a number of ways to do this, but you can do it with heaviside()
if you protect the call to heaviside()
with a torch.no_grad()
block.
Here is a script that illustrates backpropagating through a piecewise
function built using heaviside()
:
import torch
print (torch.__version__)
def square_cube (x): # a piecewise-smooth function
with torch.no_grad():
heavy = torch.heaviside (x, torch.tensor ([0.5]))
return (1.0 - heavy) * x**2 + heavy * x**3
t = torch.arange (-3.0, 3.1, 1.0, requires_grad = True)
print (t)
sc = square_cube (t)
sc.sum().backward()
print (t.grad)
And here is its output:
1.12.0
tensor([-3., -2., -1., 0., 1., 2., 3.], requires_grad=True)
tensor([-6., -4., -2., 0., 3., 12., 27.])
*) heaviside()
is differentiable, with derivative zero everywhere except
for its discontinuity. So it’s arguable that pytorch should implement its
derivative. However, with zero derivative, backpropagation through
heaviside()
isn’t really useful for anything, so not implementing its
derivative isn’t an unreasonable choice. It does break your use case,
but the torch.no_grad()
work-around seems adequate.
Best.
K. Frank