Discrepancy between the outcome of `torch.Tensor.backward()` and `torch.autograd.functional.jacobian()`

I have encountered small discrepancies between the outcome of torch.Tensor.backward() and torch.autograd.functional.jacobian(). Since the differences seem to be larger than machine precision, I have difficulty explaining them and understanding which one is correct.

In the cell below I initialize a very simple feedforward neural network and compute the sum of the gradients of the two outputs with respect to the five inputs. So the outcome consists of five numbers; one for each input variable. I do this using two methods: torch.Tensor.backward() and torch.autograd.functional.jacobian(). At first sight they seem to give the same output, but a closer look learns that sometimes the output is exactly equal and sometimes there is a difference of the order of magnitude 10e-9. It seems random which of the two is the case. I am not an expert on this, but I believe I have verified that all quantities in the game are full precision (torch.float32).

I would like to understand where the discrepancy comes from, which of the two methods gives the correct outcome, and (possibly) how I can resolve the discrepancy. Thank you in advance!

import torch
import torch.nn as nn

n_input, n_output, n_hidden = 5, 2, 10

net = nn.Sequential(nn.Linear(n_input, n_hidden),
                    nn.ReLU(),
                    nn.Linear(n_hidden, n_output),
                    nn.Sigmoid())

for i in range(5):
    
    inp = torch.rand(5)
    inp.requires_grad = True
    
    loss = net(inp)
    loss.backward(gradient=torch.ones(loss.shape))
    
    jac = torch.autograd.functional.jacobian(net, inp).sum(axis=0)
    
    print("\nRandom sample {}:".format(i))
    print(inp.grad, "(using loss.backward)")
    print(jac, "(using torch.autograd.functional.jacobian)")
    print(inp.grad == jac, "(equality check)")
    print(inp.grad - jac, "(difference)")

Output:

Random sample 0:
tensor([ 0.0206, -0.0086,  0.0514, -0.0260,  0.0332]) (using loss.backward)
tensor([ 0.0206, -0.0086,  0.0514, -0.0260,  0.0332]) (using torch.autograd.functional.jacobian)
tensor([ True, False, False, False,  True]) (equality check)
tensor([ 0.0000e+00,  9.3132e-10, -3.7253e-09, -1.8626e-09,  0.0000e+00]) (difference)

Random sample 1:
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using loss.backward)
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using torch.autograd.functional.jacobian)
tensor([ True, False,  True, False, False]) (equality check)
tensor([ 0.0000e+00,  1.8626e-09,  0.0000e+00,  7.4506e-09, -1.8626e-09]) (difference)

Random sample 2:
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using loss.backward)
tensor([ 0.0176, -0.0156, -0.0035, -0.0499, -0.0183]) (using torch.autograd.functional.jacobian)
tensor([ True,  True, False, False, False]) (equality check)
tensor([ 0.0000e+00,  0.0000e+00, -4.6566e-10,  3.7253e-09, -1.8626e-09]) (difference)

Random sample 3:
tensor([ 0.0184, -0.0170, -0.0092, -0.0420, -0.0102]) (using loss.backward)
tensor([ 0.0184, -0.0170, -0.0092, -0.0420, -0.0102]) (using torch.autograd.functional.jacobian)
tensor([True, True, True, True, True]) (equality check)
tensor([0., 0., 0., 0., 0.]) (difference)

Random sample 4:
tensor([ 0.0254, -0.0104, -0.0079, -0.0488, -0.0106]) (using loss.backward)
tensor([ 0.0254, -0.0104, -0.0079, -0.0488, -0.0106]) (using torch.autograd.functional.jacobian)
tensor([False, False, False, False, False]) (equality check)
tensor([-1.8626e-09, -9.3132e-10, -9.3132e-10, -3.7253e-09, -1.8626e-09]) (difference)

Discrepancies of 1e-9 seem expected because thats approximately the number of digits of precision that single precision floats can represent

Have you tried to run with double precision? i.e. torch.float64?

Using double precision (torch.float64 ) indeed reduces the magnitude of the discrepancies from 10e-9 to 10e-18. Thank you for your help!