Hi Daqian!

You are correct that the derivative for `heaviside()`

is not implemented.*

However, if I understand your use case correctly, you do not need to

backpropagate through `heaviside()`

itself, rather, just through the pieces

of your piecewise function.

There are a number of ways to do this, but you can do it with `heaviside()`

if you protect the call to `heaviside()`

with a `torch.no_grad()`

block.

Here is a script that illustrates backpropagating through a piecewise

function built using `heaviside()`

:

```
import torch
print (torch.__version__)
def square_cube (x): # a piecewise-smooth function
with torch.no_grad():
heavy = torch.heaviside (x, torch.tensor ([0.5]))
return (1.0 - heavy) * x**2 + heavy * x**3
t = torch.arange (-3.0, 3.1, 1.0, requires_grad = True)
print (t)
sc = square_cube (t)
sc.sum().backward()
print (t.grad)
```

And here is its output:

```
1.12.0
tensor([-3., -2., -1., 0., 1., 2., 3.], requires_grad=True)
tensor([-6., -4., -2., 0., 3., 12., 27.])
```

*) `heaviside()`

is differentiable, with derivative zero everywhere except

for its discontinuity. So it’s arguable that pytorch should implement its

derivative. However, with zero derivative, backpropagation *through*

`heaviside()`

isn’t really useful for anything, so not implementing its

derivative isn’t an unreasonable choice. It does break your use case,

but the `torch.no_grad()`

work-around seems adequate.

Best.

K. Frank