I have encountered small discrepancies between the outcome of torch.Tensor.backward()
and torch.autograd.functional.jacobian()
. Since the differences seem to be larger than machine precision, I have difficulty explaining them and understanding which one is correct.
In the cell below I initialize a very simple feedforward neural network and compute the sum of the gradients of the two outputs with respect to the five inputs. So the outcome consists of five numbers; one for each input variable. I do this using two methods: torch.Tensor.backward()
and torch.autograd.functional.jacobian()
. At first sight they seem to give the same output, but a closer look learns that sometimes the output is exactly equal and sometimes there is a difference of the order of magnitude 10e-9. It seems random which of the two is the case. I am not an expert on this, but I believe I have verified that all quantities in the game are full precision (torch.float32
).
I would like to understand where the discrepancy comes from, which of the two methods gives the correct outcome, and (possibly) how I can resolve the discrepancy. Thank you in advance!
import torch
import torch.nn as nn
n_input, n_output, n_hidden = 5, 2, 10
net = nn.Sequential(nn.Linear(n_input, n_hidden),
nn.ReLU(),
nn.Linear(n_hidden, n_output),
nn.Sigmoid())
for i in range(5):
inp = torch.rand(5)
inp.requires_grad = True
loss = net(inp)
loss.backward(gradient=torch.ones(loss.shape))
jac = torch.autograd.functional.jacobian(net, inp).sum(axis=0)
print("\nRandom sample {}:".format(i))
print(inp.grad, "(using loss.backward)")
print(jac, "(using torch.autograd.functional.jacobian)")
print(inp.grad == jac, "(equality check)")
print(inp.grad - jac, "(difference)")
Output:
Random sample 0:
tensor([ 0.0206, -0.0086, 0.0514, -0.0260, 0.0332]) (using loss.backward)
tensor([ 0.0206, -0.0086, 0.0514, -0.0260, 0.0332]) (using torch.autograd.functional.jacobian)
tensor([ True, False, False, False, True]) (equality check)
tensor([ 0.0000e+00, 9.3132e-10, -3.7253e-09, -1.8626e-09, 0.0000e+00]) (difference)
Random sample 1:
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using loss.backward)
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using torch.autograd.functional.jacobian)
tensor([ True, False, True, False, False]) (equality check)
tensor([ 0.0000e+00, 1.8626e-09, 0.0000e+00, 7.4506e-09, -1.8626e-09]) (difference)
Random sample 2:
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using loss.backward)
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using torch.autograd.functional.jacobian)
tensor([ True, True, False, False, False]) (equality check)
tensor([ 0.0000e+00, 0.0000e+00, -4.6566e-10, 3.7253e-09, -1.8626e-09]) (difference)
Random sample 3:
tensor([ 0.0184, -0.0170, -0.0092, -0.0420, -0.0102]) (using loss.backward)
tensor([ 0.0184, -0.0170, -0.0092, -0.0420, -0.0102]) (using torch.autograd.functional.jacobian)
tensor([True, True, True, True, True]) (equality check)
tensor([0., 0., 0., 0., 0.]) (difference)
Random sample 4:
tensor([ 0.0254, -0.0104, -0.0079, -0.0488, -0.0106]) (using loss.backward)
tensor([ 0.0254, -0.0104, -0.0079, -0.0488, -0.0106]) (using torch.autograd.functional.jacobian)
tensor([False, False, False, False, False]) (equality check)
tensor([-1.8626e-09, -9.3132e-10, -9.3132e-10, -3.7253e-09, -1.8626e-09]) (difference)