**Sorry if my question isn’t appropriate to ask here! It’s a bit theory related question. But I want people to share knowledge to understand what’s happening in neural networks.**

As far as I understand, Pytorch use chain rule to compute gradients of loss w.r.t. network parameters.

Therefore, when we use an indifferentiable function such as step function (torch.sign() ) in the neural network, the gradient won’t be propagated hence loss won’t decrease.

In the code below, I implemented a very simple network that contains a step function to see if I can solve an indifferentiability problem.

Here, I apply backpropagation twice. In the first backpropagation, I save gradients just before the step function and manually provide the saved gradients to the further layers skipping step function part.

Surprisingly, after multiple iterations, the loss becomes 0.

I can’t be sure why this approach helps to decrease the loss even though I’m not providing correct gradients. If anyone has an idea, please give me an insight.

```
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable, Function
import numpy as np
grads = {}
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
fc1 = nn.Linear(2,2, bias = False)
input = Variable(torch.tensor([1,1]).float(), requires_grad=True)
wx11 =1
wx12 =2
wx21 =3
wx22 =4
with torch.no_grad():
fc1.weight.data = torch.Tensor([[wx11, wx21],
[wx12,wx22]])
y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out) #ground truth is set to 0 this time.
print "========outs========"
print y
print z
print out
print loss
print "========train========"
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
out.register_hook(save_grad('out'))
loss.backward(retain_graph=True)
y.backward(grads['z'])
gamma = 0.01
for i in range(500):
for name, param in fc1.named_parameters():
if param.requires_grad:
param.grad.data.zero_()
y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out)
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
out.register_hook(save_grad('out'))
# backward
loss.backward(retain_graph=True)
y.backward(grads['z'])
print "loss: ",loss.item()
for name, param in fc1.named_parameters():
if param.requires_grad:
param.data = param - gamma * param.grad
```

and a part of the results is below

```
========outs========
tensor([ 4., 6.])
tensor([ 1., 1.])
tensor(2.)
tensor(2.)
========train========
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
```