Automatic gradient and torch.sum

takafusui · February 6, 2020, 12:54pm

Hi,

I use PyTorch’s automatic gradient function to compute the Jacobian and supply it to IPOPT to solve an NLP problem.

Now I want to take the Jacobian of the following equation:

def eval_g(x):
        """ The system of non-linear equilibrium conditions
        x[0]: Capital stock in the next period
        """
        assert len(x) == nvar

        # Consumption today
        con = a * k**alpha * ls**(1-alpha) - x[0]

        # Labor supply tomorrow
        # ls_plus = np.empty(x5.shape)
        ls_plus = torch.empty(x5.shape)
        for aplus_idx, aplus_val in enumerate(aplus):
            ls_plus[aplus_idx] = ls_compute(
                k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)
        # Capital stock tomorrow
        k_plusplus = torch.empty(aplus.shape)
        for aplus_idx, aplus_val in enumerate(aplus):
            state_plus = torch.tensor([x[0], aplus_val])[None, :].float()
            observed_pred = likelihood(gp_kplus(state_plus))
            # k_plusplus[aplus_idx] = observed_pred.mean
            k_plusplus[aplus_idx] = torch.tensor(k)

        # Consumption tomorrow
        con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

        # ------------------------------------------------------------------- #
        # Euler equation
        # ------------------------------------------------------------------- #
        g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
            1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

        return np.array([g0])

Note that the equation that I want to compute the Jacobian is g0.

Then I compute the Jacobian of the above function with the following syntax:

def eval_jac_g(x, flag):
    """ Numerical approximation of the Jacobian of the system of
    non-linear equilibrium conditions
    Jacobian is computed by the automatic gradient of PyTorch """
    assert len(x) == nvar

"""
Big skip
"""

    else:
        # Automatic gradient by PyTorch
        assert len(x) == nvar
        _eval_jac_g = np.empty(nnzj)
        for i in range(ncon):
            x_tensor = torch.tensor(x, requires_grad=True)
            eval_g_tensor = eval_g(x_tensor)[i]
            grads = torch.autograd.grad(eval_g_tensor, x_tensor, create_graph=True)
            for j in range(nvar):
                _eval_jac_g[j + i*nvar] =grads[0][j].detach().numpy()
        return _eval_jac_g

In my previous projects, I can compute the correct Jacobian, but without success in this time.

When I remove the items started from torch.sum(), it returns the correct Jacobian; therefore, I strongly suspect that I did something wrong in torch.sum().

How should I modify the code with torch.sum() to compute the correct Jacobian?

Any comments are highly appreciated.

Thank you in advance.

Takafumi

albanD · February 6, 2020, 2:53pm

Hi,

No the torch.sum() should not be a problem.
But the fact that your function return np.array([g0]) returns a numpy array might be problematic

takafusui · February 6, 2020, 3:23pm

Thank you for your comment.
I agree that return np.array([g0]) is suspicious, but to my surprise, it does not.

For instance if I have:

g0 = 1 / con - beta * alpha

instead of

g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
            1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

In this case, PyTorch correctly computes Jacobian. I meant the Jacobian is numerically equivalent to the one from the IPOPT’s derivative checker.

This is why I am confused…

How do you see the issue?

albanD · February 6, 2020, 3:28pm

You can use torch.autograd.gradcheck(eval_g, x) with input in double precision to check as well.
Also you have a jacobian function here if you need the full Jacobian.

From a quick look at your code, it seems that there are a lot of things that are not define in the function but captured from other scopes right?
If you can make a small code sample that uses eval_g that reproduce the issue (you can our gradcheck so that we won’t need IPOPT to run this toy example), I can take a look.

takafusui · February 7, 2020, 9:33am

Hi,

thank you very much for your kind offer! I will definitely check the Jacobian function that you proposed, but in the meanwhile, I attached the code that returns my issue below:

import numpy as np
import torch


# --------------------------------------------------------------------------- #
# Parameters
# --------------------------------------------------------------------------- #
k = 0.1
a = 1

alpha = 0.36
beta = 0.95
psi = 0.25
theta = 1.5
s = 0.01
mu = 0
rho = 0.95

# Gauss-Hermite quadrature
# Nodes
x5 = np.sqrt(2) * s * np.array(
    [2.020182870456086, 0.9585724646138185, 0, -0.9585724646138185,
     -2.020182870456086]) + mu
# Weights
omega5 = np.pi**(-1/2) * torch.tensor(
    [0.01995324205904591, 0.3936193231522412, 0.9453087204829419,
     0.3936193231522412, 0.01995324205904591])

nvar = 1  # Number of variables
ncon = nvar  # Number of constraints

nnzj = int(nvar * ncon)  # Number of (possibly) non-zeros in Jacobian


def ls_compute(k, A, alpha=alpha, psi=psi, theta=theta):
    """ Return the optimal labor supply """
    return (((1-alpha) * A * k**alpha) / (psi*theta))**(1 / (alpha+theta-1))


# Current labor supply
ls = ls_compute(k=k, A=a, alpha=alpha, psi=psi, theta=theta)

# AR(1) shock
aplus = torch.empty(x5.shape)
for epsilon_idx, epsilon_plus in enumerate(x5):
    aplus[epsilon_idx] = a**rho * np.exp(epsilon_plus)


# --------------------------------------------------------------------------- #
# Functions
# eval_g returns the set of constraints that needs to be first-derivative
# eval_grad_f returns (1) the structure of the Jacobian and (2) its values
# --------------------------------------------------------------------------- #

def eval_g(x):
    """ The system of non-linear equilibrium conditions
    x[0]: Capital stock in the next period
    """
    assert len(x) == nvar

    # Consumption today
    con = a * k**alpha * ls**(1-alpha) - x[0]

    # Labor supply tomorrow
    # ls_plus = np.empty(x5.shape)
    ls_plus = torch.empty(x5.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        ls_plus[aplus_idx] = ls_compute(
            k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)
    # Capital stock tomorrow
    k_plusplus = torch.empty(aplus.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        # state_plus = torch.tensor([x[0], aplus_val])[None, :].float()
        k_plusplus[aplus_idx] = torch.tensor(k)  # Needs to be modified

    # Consumption tomorrow
    con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

    # ----------------------------------------------------------------------- #
    # Euler equation
    # ----------------------------------------------------------------------- #
    g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
        1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

    return np.array([g0])


# Test
print(r"Constraint returns {}".format(eval_g(np.array([k]))))
# sys.exit(0)


def eval_jac_g(x, flag):
    """ Numerical approximation of the Jacobian of the system of
    non-linear equilibrium conditions
    Jacobian is computed by the automatic gradient of PyTorch """
    assert len(x) == nvar

    row_idx = np.empty(nnzj, dtype=int)  # Row index
    col_idx = np.empty(nnzj, dtype=int)  # Column index

    # Jacobian matrix structure
    if flag:
        for i in range(ncon):
            for j in range(nvar):
                row_idx[j + i * nvar] = i
                col_idx[j + i * nvar] = j

        return (row_idx, col_idx)

    else:
        # Automatic gradient by PyTorch
        assert len(x) == nvar
        _eval_jac_g = np.empty(nnzj)
        for i in range(ncon):
            x_tensor = torch.tensor(x, requires_grad=True)
            eval_g_tensor = eval_g(x_tensor)[i]
            grads = torch.autograd.grad(eval_g_tensor, x_tensor,
                                        create_graph=True)
            for j in range(nvar):
                _eval_jac_g[j + i*nvar] = grads[0][j].detach().numpy()
        return _eval_jac_g


# Test
print(r"The structure of the Jacobian is {}".format(
    eval_jac_g(np.array([k]), True)))
print(r"The values of each element in the Jacobian are {}".format(
    eval_jac_g(np.array([k]), False)))

The code should be standalone itself. Please note that in this script, the structure of Jacobian is flattened, i.e., it should be (df0/dx0, df0/dx1, …, df0/dxn, df1/dx0, …, dfn/dxn). In eval_jac_g, the first block returns the structure of Jacobian, and the second one returns its values. Basically you need to look at the second block…

I highly appreciate if you provide me the correct syntax to return Jacobian.

Thank you very much in advance.

takafusui · February 7, 2020, 3:00pm

Hi,

some updates. I used the method you suggested last time to compute Jacobian and also check the difference between the automatic gradient and a finite difference method. I attach my code. I notice a big difference between two… I appreciate your help.
(Sorry it is much more easy to compute a finite difference by myself, not using the torch.autograd.gradhcheck() method…)

import numpy as np
import torch


# --------------------------------------------------------------------------- #
# Parameters
# --------------------------------------------------------------------------- #
k = 0.1
a = 1

alpha = 0.36
beta = 0.95
psi = 0.25
theta = 1.5
s = 0.01
mu = 0
rho = 0.95

# Gauss-Hermite quadrature
# Nodes
x5 = np.sqrt(2) * s * np.array(
    [2.020182870456086, 0.9585724646138185, 0, -0.9585724646138185,
     -2.020182870456086]) + mu
# Weights
omega5 = np.pi**(-1/2) * torch.tensor(
    [0.01995324205904591, 0.3936193231522412, 0.9453087204829419,
     0.3936193231522412, 0.01995324205904591])

nvar = 1  # Number of variables
ncon = nvar  # Number of constraints

nnzj = int(nvar * ncon)  # Number of (possibly) non-zeros in Jacobian


def ls_compute(k, A, alpha=alpha, psi=psi, theta=theta):
    """ Return the optimal labor supply """
    return (((1-alpha) * A * k**alpha) / (psi*theta))**(1 / (alpha+theta-1))


# Current labor supply
ls = ls_compute(k=k, A=a, alpha=alpha, psi=psi, theta=theta)

# AR(1) shock
aplus = torch.empty(x5.shape)
for epsilon_idx, epsilon_plus in enumerate(x5):
    aplus[epsilon_idx] = a**rho * np.exp(epsilon_plus)


# --------------------------------------------------------------------------- #
# Functions
# eval_g returns the set of constraints that needs to be first-derivatived
# eval_grad_f returns (1) the structure of the Jacobian and (2) its values
# --------------------------------------------------------------------------- #

def eval_g(x):
    """ The system of non-linear equilibrium conditions
    x[0]: Capital stock in the next period
    """
    assert len(x) == nvar

    # Consumption today
    con = a * k**alpha * ls**(1-alpha) - x[0]

    # Labor supply tomorrow
    # ls_plus = np.empty(x5.shape)
    ls_plus = torch.empty(x5.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        ls_plus[aplus_idx] = ls_compute(
            k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)
    # Capital stock tomorrow
    k_plusplus = torch.empty(aplus.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        # state_plus = torch.tensor([x[0], aplus_val])[None, :].float()
        k_plusplus[aplus_idx] = torch.tensor(k)  # Needs to be modified

    # Consumption tomorrow
    con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

    # ----------------------------------------------------------------------- #
    # Euler equation
    # ----------------------------------------------------------------------- #
    g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
        1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

    return np.array([g0])


# Test
print(r"Constraint returns {}".format(eval_g(np.array([k]))))
# sys.exit(0)


def jacobian(y, x, create_graph=False):
    jac = []
    flat_y = y.reshape(-1)
    grad_y = torch.zeros_like(flat_y)
    for i in range(len(flat_y)):
        grad_y[i] = 1.
        grad_x, = torch.autograd.grad(flat_y,
                                      x,
                                      grad_y,
                                      retain_graph=True,
                                      create_graph=create_graph)
        jac.append(grad_x)
        grad_y[i] = 0.
    return torch.stack(jac).reshape(y.shape + x.shape)


def eval_jac_g(x, flag):
    """ Numerical approximation of the Jacobian of the system of
    non-linear equilibrium conditions
    Jacobian is computed by the automatic gradient of PyTorch """
    assert len(x) == nvar

    row_idx = np.empty(nnzj, dtype=int)  # Row index
    col_idx = np.empty(nnzj, dtype=int)  # Column index

    # Jacobian matrix structure
    if flag:
        for i in range(ncon):
            for j in range(nvar):
                row_idx[j + i * nvar] = i
                col_idx[j + i * nvar] = j

        return (row_idx, col_idx)

    else:
        # Automatic gradient by PyTorch
        assert len(x) == nvar
        # _eval_jac_g = np.empty(nnzj)
        # for i in range(ncon):
        #     x_tensor = torch.tensor(x, requires_grad=True)
        #     eval_g_tensor = eval_g(x_tensor)[i]
        #     grads = torch.autograd.grad(eval_g_tensor, x_tensor,
        #                                 create_graph=True)
        #     for j in range(nvar):
        #         _eval_jac_g[j + i*nvar] = grads[0][j].detach().numpy()
        #     print(_eval_jac_g)
        # return _eval_jac_g

        # ------------------------------------------------------------------- #
        # Use the method based on:
        # https://gist.github.com/apaszke/226abdf867c4e9d6698bd198f3b45fb7
        # ------------------------------------------------------------------- #
        x_tensor = torch.tensor(x, requires_grad=True)
        _eval_jac_g = []
        for i in range(ncon):
            _eval_jac_g.append(jacobian(eval_g(x_tensor)[i], x_tensor))
        return torch.stack(_eval_jac_g).flatten().numpy()


# Test
print(r"The structure of the Jacobian is {}".format(
    eval_jac_g(np.array([k]), True)))
print(r"The values of each element in the Jacobian are {}".format(
    eval_jac_g(np.array([k]), False)))

# --------------------------------------------------------------------------- #
# Finite difference
# --------------------------------------------------------------------------- #
h = 1e-6
grad = (eval_g(np.array([k + h])) - eval_g(np.array([k - h]))) / (2*h)

diff = np.abs(grad - eval_jac_g(np.array([k]), False))
print("Now the difference is {}".format(diff))

albanD · February 7, 2020, 3:51pm

The code looks ok to me. You have 1 input and 1 output. So your jacobian is a single element.

Running the gradcheck as:

from torch.autograd import gradcheck
gradcheck(eval_g, torch.tensor([k], dtype=torch.double, requires_grad=True))

Gives

numerical:tensor([[76.0006]], dtype=torch.float64)
analytical:tensor([[75.7641]], dtype=torch.float64)

Given the number of operations in your formula. It looks plausible to me.
How does your function look if you plot it? Is it properly smooth?

takafusui · February 8, 2020, 12:27pm

Thank you again for your check. I plot the eval_g function, as you see below. It is smooth enough.
I also plot a tangent line when the slope is given by eval_jac_g, which uses an automatic gradient.
How do you see the figure? It confuses me very much…
Thank you again.

# --------------------------------------------------------------------------- #
# Plot the eval_g function
# --------------------------------------------------------------------------- #
krange = np.linspace(0.025, 0.2, 100)
eval_g_var = np.empty_like(krange)

# Compute the Jacobian on k=0.1
eval_jac_g_k = eval_jac_g(np.array([k]), False)
eval_g_k = eval_g(np.array([k]))
# Plot a tangent line
tan_k = eval_jac_g_k * (krange - k) + eval_g_k

for idx, var in enumerate(krange):
    eval_g_var[idx] = eval_g(np.array([var]))
plt.plot(krange, eval_g_var, 'k-', label=r'eval_g')
plt.plot(krange, tan_k, 'b-', label=r'eval_jac_g when $x=0.1$')
plt.xlabel(r"$x$")
plt.xlim([0.025, 0.2])
plt.legend(loc='best')
plt.show()

It produces:
tmp

albanD · February 9, 2020, 1:07am

Yes, this match the numerical test: the tangent is quite accurate.

takafusui · February 9, 2020, 9:55am

But we still have a considerable difference between a numerical and analytical value, which is not good.

I remove torch.sum from my code and compare the numerical and analytical Jacobian:

# --------------------------------------------------------------------------- #
# without torch.sum()
# --------------------------------------------------------------------------- #
x5 = np.sqrt(2) * s * torch.tensor([0]) + mu
# Weights
omega5 = torch.tensor([1])

# AR(1) shock
aplus = torch.empty(x5.shape)
for epsilon_idx, epsilon_plus in enumerate(x5):
    aplus[epsilon_idx] = a**rho * np.exp(epsilon_plus)


def eval_g_wo_sum(x):
    """ The system of non-linear equilibrium conditions
    without torch.sum()
    x[0]: Capital stock in the next period
    """
    assert len(x) == nvar

    # Consumption today
    con = a * k**alpha * ls**(1-alpha) - x[0]

    # Labor supply tomorrow
    # ls_plus = np.empty(x5.shape)
    ls_plus = torch.empty(x5.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        ls_plus[aplus_idx] = ls_compute(
            k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)

    # Capital stock tomorrow
    k_plusplus = torch.empty(aplus.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        # state_plus = torch.tensor([x[0], aplus_val])[None, :].float()
        k_plusplus[aplus_idx] = torch.tensor(k)  # Needs to be modified

    # Consumption tomorrow
    con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

    # ----------------------------------------------------------------------- #
    # Euler equation
    # ----------------------------------------------------------------------- #
    g0 = 1 / con - beta * alpha * (omega5 * (
        1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

    print(g0)
    return np.array([g0])


def eval_jac_g_wo_sum(x, flag):
    """ Numerical approximation of the Jacobian of the system of
    non-linear equilibrium conditions
    Jacobian is computed by the automatic gradient of PyTorch """
    assert len(x) == nvar

    row_idx = np.empty(nnzj, dtype=int)  # Row index
    col_idx = np.empty(nnzj, dtype=int)  # Column index

    # Jacobian matrix structure
    if flag:
        for i in range(ncon):
            for j in range(nvar):
                row_idx[j + i * nvar] = i
                col_idx[j + i * nvar] = j

        return (row_idx, col_idx)

    else:
        # Automatic gradient by PyTorch
        assert len(x) == nvar
        _eval_jac_g_wo_sum = np.empty(nnzj)
        for i in range(ncon):
            x_tensor = torch.tensor(x, requires_grad=True)
            eval_g_wo_sum_tensor = eval_g_wo_sum(x_tensor)[i]
            grads = torch.autograd.grad(eval_g_wo_sum_tensor, x_tensor,
                                        create_graph=True)
            for j in range(nvar):
                _eval_jac_g_wo_sum[j + i*nvar] = grads[0][j].detach().numpy()
        return _eval_jac_g_wo_sum


# --------------------------------------------------------------------------- #
# Gradient check: finite difference
# --------------------------------------------------------------------------- #
h = 1e-6
grad = (eval_g_wo_sum(np.array([k + h]))
        - eval_g_wo_sum(np.array([k - h]))) / (2*h)

diff = np.abs(grad - eval_jac_g_wo_sum(np.array([k]), False))
print("Now the difference is {}".format(diff))

but there is still a considerable difference.
I should debug but no clue. Could you show me how should I debug and check how things are going in the automatic gradient in PyTorch?

By the way, I also print g0 and it returns

tensor([-0.7946])
tensor([-0.7947])
tensor([-0.7947], grad_fn=<SubBackward0>)

Is it normal?
Thank for your help.

albanD · February 9, 2020, 8:11pm

I would assume that they come from numerical precision issue.
Unfortunately, the more computations you do, the more small floating point error increase.
Given the large number of operations that you do to evaluate g0, I’m not surprised that you only get 3 digits of precision (remember that float is 6 digits precision for a single operation).

takafusui · February 9, 2020, 9:41pm

Thank you again for your comment.

I’m not surprised that you only get 3 digits of precision (remember that float is 6 digits precision for a single operation)

What did you mean here? 3 digits of precision come from that, in the above example, I have

tensor([-0.7946])
tensor([-0.7947])
tensor([-0.7947], grad_fn=<SubBackward0>)

Is it correct?

So you might suspect that the difference between the automatic differentiation and the numerical approximation with torch.autograd.gradcheck() comes from a numerical precision issue. You might recommend me to use the Jacobian computed from torch.autograd.grad and use it in an optimization?

albanD · February 9, 2020, 11:22pm

What did you mean here?

Comes from the code sample with gradcheck above. Playing with the value of k, you get from 2 to 4 digits of precision in the computed gradient.

So you might suspect that the difference between the automatic differentiation and the numerical approximation with torch.autograd.gradcheck() comes from a numerical precision issue.

Yes I think this is where it comes from.
The gradients from finite difference are not more correct than the ones computed with the autograd. Just that they do different approximation and so give a different result at the end. Both should be good enough to use in a stochastic optimization procedure.

takafusui · February 10, 2020, 8:16am

Thank you again for your comment. I am almost there to understand the issue.
May I ask you again how can I use torch.autograd.gradcheck for future reference?
You mentioned above

grad_chk = torch.autograd.gradcheck(
    eval_g, torch.tensor([k], dtype=torch.double, requires_grad=True))

But it gives an error…

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~/Dropbox/tmp_pytorch.py in <module>
    156 # grad = eval_jac_g(np.array([k]), False)
    157 grad_chk = torch.autograd.gradcheck(
--> 158     eval_g, torch.tensor([k], dtype=torch.double, requires_grad=True))
    159 print(grad_chk)
    160

~/.pyenv/versions/anaconda3-5.3.0/envs/adapt/lib/python3.7/site-packages/torch/autograd/gradcheck.py in gradcheck(func, inputs, eps, atol, rtol, raise_exception, check_sparse_nnz, nondet_tol)
    258
    259     func_out = func(*tupled_inputs)
--> 260     output = _differentiable_outputs(func_out)
    261
    262     if not output:

~/.pyenv/versions/anaconda3-5.3.0/envs/adapt/lib/python3.7/site-packages/torch/autograd/gradcheck.py in _differentiable_outputs(x)
    182
    183 def _differentiable_outputs(x):
--> 184     return tuple(o for o in _as_tuple(x) if o.requires_grad)
    185
    186

~/.pyenv/versions/anaconda3-5.3.0/envs/adapt/lib/python3.7/site-packages/torch/autograd/gradcheck.py in <genexpr>(.0)
    182
    183 def _differentiable_outputs(x):
--> 184     return tuple(o for o in _as_tuple(x) if o.requires_grad)
    185
    186

AttributeError: 'numpy.ndarray' object has no attribute 'requires_grad'

How should I fix it?
The function that I want to check gradient is defined as

def eval_g(x):
    """ The system of non-linear equilibrium conditions
    x[0]: Capital stock in the next period
    """
    assert len(x) == nvar

    # Consumption today
    con = a * k**alpha * ls**(1-alpha) - x[0]

    # Labor supply tomorrow
    # ls_plus = np.empty(x5.shape)
    ls_plus = torch.empty(x5.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        ls_plus[aplus_idx] = ls_compute(
            k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)
    # Capital stock tomorrow
    k_plusplus = torch.empty(aplus.shape)
    for aplus_idx, aplus_val in enumerate(aplus):
        # state_plus = torch.tensor([x[0], aplus_val])[None, :].float()
        k_plusplus[aplus_idx] = torch.tensor(k)  # Needs to be modified

    # Consumption tomorrow
    con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

    # ----------------------------------------------------------------------- #
    # Euler equation
    # ----------------------------------------------------------------------- #
    g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
        1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

    return np.array([g0])

albanD · February 10, 2020, 3:44pm

Yes, gradcheck only works for torch.Tensor.
You can simply modify your function by changing return np.array([g0]) to return g0.
And make sure you give a Tensor (double is better for precision reasons) as input.

takafusui · February 11, 2020, 11:49am

Thank you. Now it works.
So we conclude that the difference between numerical and analytical values stems from a numerical precision issue and how they are computed in both methods, and we can safely ignore it even PyTorch warns that

RuntimeError: Jacobian mismatch for output 0 with respect to input 0,
numerical:tensor([[76.0006]], dtype=torch.float64)
analytical:tensor([[75.7641]], dtype=torch.float64)

We should trust the values from the automatic differentiation. Do you agree with me?

albanD · February 11, 2020, 2:38pm

You can play with the input value of k, and see that the error changes, but both value remain relatively close.

The error is relatively large. But I don’t think you can get anything better with AD.
If you are not convinced. One way to check would be to split your function that evaluates g0 in smaller pieces. And gradcheck each piece. These smaller pieces should all have smaller numerical errors and pass the test.

takafusui · February 12, 2020, 12:02pm

Thank you for your comments. After some trials, I finally figured out the way and pass torch.autograd.gradcheck. Just in case, I attach the code below. I appreciate your continuous help.

import numpy as np
import torch

print(r"PyTorch version is {}".format(torch.__version__))

# --------------------------------------------------------------------------- #
# Parameters
# --------------------------------------------------------------------------- #
k = 0.1
a = 1

alpha = 0.36
beta = 0.95
psi = 0.25
theta = 1.5
s = 0.01
mu = 0
rho = 0.95

# Gauss-Hermite quadrature
# Nodes
x5 = np.sqrt(2) * s * torch.tensor(
    [2.020182870456086, 0.9585724646138185, 0, -0.9585724646138185,
     -2.020182870456086], dtype=torch.float64) + mu

# Weights
omega5 = np.pi**(-1/2) * torch.tensor(
    [0.01995324205904591, 0.3936193231522412, 0.9453087204829419,
     0.3936193231522412, 0.01995324205904591], dtype=torch.float64)

nvar = 1  # Number of variables
ncon = nvar  # Number of constraints

nnzj = int(nvar * ncon)  # Number of (possibly) non-zeros in Jacobian


def ls_compute(k, A, alpha=alpha, psi=psi, theta=theta):
    """ Return the optimal labor supply """
    return (((1-alpha) * A * k**alpha) / (psi*theta))**(1 / (alpha+theta-1))


# Current labor supply
ls = ls_compute(k=k, A=a, alpha=alpha, psi=psi, theta=theta)

# AR(1) shock
aplus = torch.empty(x5.shape, dtype=torch.float64)
for epsilon_idx, epsilon_plus in enumerate(x5):
    aplus[epsilon_idx] = a**rho * np.exp(epsilon_plus)


# sys.exit(0)
# --------------------------------------------------------------------------- #
# Functions
# eval_g returns the set of constraints that needs to be first-derivative
# eval_grad_f returns (1) the structure of the Jacobian and (2) its values
# --------------------------------------------------------------------------- #
def eval_g(x):
    """ The system of non-linear equilibrium conditions
    x[0]: Capital stock in the next period
    """

    # Consumption today
    con = a * k**alpha * ls**(1-alpha) - x[0]

    # Labor supply tomorrow
    ls_plus = torch.empty(x5.shape, dtype=torch.float64)
    for aplus_idx, aplus_val in enumerate(aplus):
        ls_plus[aplus_idx] = ls_compute(
            k=x[0], A=aplus_val, alpha=alpha, psi=psi, theta=theta)
    # Capital stock tomorrow
    k_plusplus = torch.empty(aplus.shape, dtype=torch.float64)
    for aplus_idx, aplus_val in enumerate(aplus):
        k_plusplus[aplus_idx] = torch.tensor(k)  # Needs to be modified

    # Consumption tomorrow
    con_plus = aplus * x[0]**alpha * ls_plus**(1-alpha) - k_plusplus

    # ----------------------------------------------------------------------- #
    # Euler equation
    # ----------------------------------------------------------------------- #
    g0 = 1 / con - beta * alpha * torch.sum(omega5 * (
        1 / con_plus * aplus * x[0]**(alpha-1) * ls_plus**(1-alpha)))

    return [g0]


# Test
print(r"Constraint returns {}".format(eval_g(np.array([k]))))
# sys.exit(0)


def eval_jac_g(x, flag):
    """ Numerical approximation of the Jacobian of the system of
    non-linear equilibrium conditions
    Jacobian is computed by the automatic gradient of PyTorch """
    assert len(x) == nvar

    row_idx = np.empty(nnzj, dtype=int)  # Row index
    col_idx = np.empty(nnzj, dtype=int)  # Column index

    # Jacobian matrix structure
    if flag:
        for i in range(ncon):
            for j in range(nvar):
                row_idx[j + i * nvar] = i
                col_idx[j + i * nvar] = j

        return (row_idx, col_idx)

    else:
        # Automatic gradient by PyTorch
        assert len(x) == nvar
        x_tensor = torch.from_numpy(x).requires_grad_(True)
        jac = []
        for i in range(ncon):
            grad, = torch.autograd.grad(eval_g(x_tensor)[i], x_tensor)
            jac.append(grad)
        return torch.stack(jac).flatten().numpy()


# Test
print(r"The structure of the Jacobian is {}".format(
    eval_jac_g(np.array([k]), True)))
print(r"The values of each element in the Jacobian are {}".format(
    eval_jac_g(np.array([k]), False)))

k_tensor = torch.from_numpy(np.array([k])).requires_grad_(True)
chk = torch.autograd.gradcheck(eval_g, k_tensor)
print('Is the gradient check passed? {}'.format(chk))

albanD · February 12, 2020, 4:22pm

Hi,

Out of curiosity, what did you changed?

takafusui · February 12, 2020, 8:13pm

Sure, though I am not sure whether I can convince you or not:

I try not to use np.array as much as possible and define them as torch.tensor(dtype=torch.float64), such as x5 and omega5 in my example.
eval_g returns [g0], not np.array([g0]).
I do not use jacobian script you proposed above. Instead I define gradient by myself:

x_tensor = torch.from_numpy(x).requires_grad_(True)
jac = []
    for i in range(ncon):
        grad, = torch.autograd.grad(eval_g(x_tensor)[i], x_tensor)
        jac.append(grad)
    return torch.stack(jac).flatten().numpy()

Finally I use torch.autograd.gradcheck function with the following syntax

k_tensor = torch.from_numpy(np.array([k])).requires_grad_(True)
chk = torch.autograd.gradcheck(eval_g, k_tensor)
print('Is the gradient check passed? {}'.format(chk))

chk returns True