Sorry if my question isn’t appropriate to ask here! It’s a bit theory related question. But I want people to share knowledge to understand what’s happening in neural networks.
As far as I understand, Pytorch use chain rule to compute gradients of loss w.r.t. network parameters.
Therefore, when we use an indifferentiable function such as step function (torch.sign() ) in the neural network, the gradient won’t be propagated hence loss won’t decrease.
In the code below, I implemented a very simple network that contains a step function to see if I can solve an indifferentiability problem.
Here, I apply backpropagation twice. In the first backpropagation, I save gradients just before the step function and manually provide the saved gradients to the further layers skipping step function part.
Surprisingly, after multiple iterations, the loss becomes 0.
I can’t be sure why this approach helps to decrease the loss even though I’m not providing correct gradients. If anyone has an idea, please give me an insight.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable, Function
import numpy as np
grads = {}
def save_grad(name):
def hook(grad):
grads[name] = grad
return hook
fc1 = nn.Linear(2,2, bias = False)
input = Variable(torch.tensor([1,1]).float(), requires_grad=True)
wx11 =1
wx12 =2
wx21 =3
wx22 =4
with torch.no_grad():
fc1.weight.data = torch.Tensor([[wx11, wx21],
[wx12,wx22]])
y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out) #ground truth is set to 0 this time.
print "========outs========"
print y
print z
print out
print loss
print "========train========"
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
out.register_hook(save_grad('out'))
loss.backward(retain_graph=True)
y.backward(grads['z'])
gamma = 0.01
for i in range(500):
for name, param in fc1.named_parameters():
if param.requires_grad:
param.grad.data.zero_()
y = fc1(input)
z = torch.sign(y)
out = sum(z)
loss = abs(0-out)
y.register_hook(save_grad('y'))
z.register_hook(save_grad('z'))
out.register_hook(save_grad('out'))
# backward
loss.backward(retain_graph=True)
y.backward(grads['z'])
print "loss: ",loss.item()
for name, param in fc1.named_parameters():
if param.requires_grad:
param.data = param - gamma * param.grad
and a part of the results is below
========outs========
tensor([ 4., 6.])
tensor([ 1., 1.])
tensor(2.)
tensor(2.)
========train========
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 2.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0
loss: 0.0