What does perturb mean in the context of the numerical Jacobian?

mortonjt · January 29, 2021, 11:22pm

I’m looking for clarification on what exactly “perturbation” means in the context of calculating the numerical Jacobian.

To give a bit of context, I’ve been trying to put together some custom autograd functions where I can compute Hessians. Below is a skeleton of the code that I’m working with

import torch
def _upward_pass(x):
    # stuff happens - this is the "forward" calculation
    return Vt   # some output value
def _downward_pass(Et):
    # more stuff happens - this is the "backward" calculation
    return E  # some derivative of x, i.e. dVt / dx * Et, I think, ...
def _adjoint_upward_pass(dx):
    # compute directional gradient
    return Vdt
def _adjoint_backward_pass(E):
    # compute hessian, d^2Vt / dx^2
    return Ed

# the actual autograd function
class ForwardFunction(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x):
        Vt  = _upwards_pass(x)
        ctx.save_for_backward(x)
        return Vt
    @staticmethod
    def backward(ctx, Et):
        x = ctx.saved_tensors
        E = ForwardFunctionBackward.apply(x, Et)
        return E

class ForwardFunctionBackward(torch.autograd.Function):
    @staticmethod
    def forward(ctx, x, Et):
        E = _downwards_pass(Et)
        return E
    @staticmethod
    def backward(ctx, dx):
        Vtd = _adjoint_upwards_pass(dx)
        Ed = _adjoint_downwards_pass(E)
        return Ed

As I put this codebase together, I’ve been testing the outputs with gradcheck and gradgradcheck. I got gradcheck working, but gradgradcheck will randomly fail, either returning all zeros in the numerical output or all zeros in the analytical output (depending on the inputs). So sometimes my function is “constant”, where no perturbations will change the output of the function.

Part of the issue that makes development slow is my struggle understanding the inputs and outputs of all of the functions. My current challenge is understanding what a “perturbation” means in the context of calculating the numerical Jacobian – are the inputs to the function “perturbed” in order to compute gradients? Specifically, should I expect some delta term to be added to dx during the gradgradcheck run?

Any thoughts on understanding the internals of autograd will be greatly appreciated (in addition to any insights on how to debug this sort of thing).