Say we have a simple tensor operation pipeline as:
#w, x are 1D tensors and w.requires_grad=True
act = torch.matmul(w,x)
output = torch.max(0,act).bool().int() #output = 1 if act > 0 else output = 0, which is simply a step function
Now if i want to apply autograd to such a computational operation using output.backward(), would it be possible to obtain gradients for w, and eventually optimize it ?
You have a step function, so its derivative is 0 almost everywhere
and undefined at the “interesting” point where the step takes place.
What would you want the gradients to be? How would you use such
gradients in a gradient-descent optimization?
For backpropagation / gradient descent to work, your functions need
to be usefully differentiable. The typical approach in cases where you
“want” a step function is to use a differentiable “soft” approximation to
the step function such as sigmoid() or tanh().