Computing gradient of discontinuous operations

Dipayan_Das · August 20, 2020, 7:31pm

Say we have a simple tensor operation pipeline as:

#w, x are 1D tensors and w.requires_grad=True
act = torch.matmul(w,x)
output = torch.max(0,act).bool().int() #output = 1 if act > 0 else output = 0, which is simply a step function

Now if i want to apply autograd to such a computational operation using output.backward(), would it be possible to obtain gradients for w, and eventually optimize it ?

KFrank · August 20, 2020, 9:14pm

Hello Dipayan!

The short answer is no.

You have a step function, so its derivative is 0 almost everywhere
and undefined at the “interesting” point where the step takes place.
What would you want the gradients to be? How would you use such
gradients in a gradient-descent optimization?

For backpropagation / gradient descent to work, your functions need
to be usefully differentiable. The typical approach in cases where you
“want” a step function is to use a differentiable “soft” approximation to
the step function such as sigmoid() or tanh().

Good luck.

K. Frank