Jacobian matrix computation

Hi there, I was trying to compute the real Jacobian matrix of the input and output from a network. I figure out two ways to do this, and I think they should be identical. But the results are different and I can’t find why. Below are some example codes to repeat my result.

import torch
from torch import nn
n = 2
in_dim = n
hidden_dims = [10,20]
rnvpNet = nn.ModuleList([])
for h_dim in hidden_dims:
    rnvpNet.append(nn.Linear(in_dim, h_dim))
    in_dim = h_dim

rnvpNet = rnvpNet.append(nn.Linear(h_dim, n))

x = torch.rand(4, n, requires_grad=True)
cur_batch_size = x.shape[0]

# method 1
x1 = x
z1 = x1
for rnvpModule in rnvpNet:
    z1 = rnvpModule(z1)

Jacob1 = torch.eye(n, requires_grad=True).unsqueeze(0).expand(cur_batch_size, n, n)
J1 = Jacob1.clone()
for i in range(n):
    v1 = torch.zeros_like(x)
    v1[:, i] = 1
    J1[:, i, :] = torch.autograd.grad(z1, x1, v1, retain_graph=True)[0]

# method 2
x2 = x.repeat(1, n).view(-1, n)
z2 = x2
for rnvpModule in rnvpNet:
    z2 = rnvpModule(z2)

v2= torch.eye(n).repeat(cur_batch_size, 1)
J2 = torch.autograd.grad(z2, x2, v2, create_graph=True)[0]
J2 = J2.reshape(cur_batch_size, n, n)

J1 method needs to run a for loop, so I think it is slow if n is large. Then I find another implementation, the J2 method from some online sources. I think J1 should be exactly the same as J2. Can anyone help to find what’s wrong with my code?