# Circumventing an indifferentiablility problem

Sorry if my question isn’t appropriate to ask here! It’s a bit theory related question. But I want people to share knowledge to understand what’s happening in neural networks.

As far as I understand, Pytorch use chain rule to compute gradients of loss w.r.t. network parameters.
Therefore, when we use an indifferentiable function such as step function (torch.sign() ) in the neural network, the gradient won’t be propagated hence loss won’t decrease.
In the code below, I implemented a very simple network that contains a step function to see if I can solve an indifferentiability problem.

Here, I apply backpropagation twice. In the first backpropagation, I save gradients just before the step function and manually provide the saved gradients to the further layers skipping step function part.
Surprisingly, after multiple iterations, the loss becomes 0.
I can’t be sure why this approach helps to decrease the loss even though I’m not providing correct gradients. If anyone has an idea, please give me an insight.

``````import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

return hook

fc1 = nn.Linear(2,2, bias = False)

wx11 =1
wx12 =2
wx21 =3
wx22 =4

fc1.weight.data = torch.Tensor([[wx11, wx21],
[wx12,wx22]])

y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out) #ground truth is set to 0 this time.

print "========outs========"
print y
print z
print out
print loss
print "========train========"
loss.backward(retain_graph=True)

gamma = 0.01
for i in range(500):

for name, param in fc1.named_parameters():

y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out)

# backward
loss.backward(retain_graph=True)
print "loss: ",loss.item()
for name, param in fc1.named_parameters():
param.data = param - gamma * param.grad
``````

and a part of the results is below

``````========outs========
tensor([ 4.,  6.])
tensor([ 1.,  1.])
tensor(2.)
tensor(2.)
========train========
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0
loss:  0.0

``````

Hello,

`torch.sign()` can work with backpropagation.

``````a = torch.randn(3,3,3, requires_grad=True)
b = torch.sign(a)
>> True
>> <SignBackward at 0x7fe7727f3ef0>

``````

Sure, but in that case (ordinarily apply backpropagation once), loss won’t decrease since the derivative of torch.sign() always returns 0 as a gradient.
So I got this output below:

``````========train========
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
loss:  2.0
``````