I’m looking for clarification on what exactly “perturbation” means in the context of calculating the numerical Jacobian.
To give a bit of context, I’ve been trying to put together some custom autograd functions where I can compute Hessians. Below is a skeleton of the code that I’m working with
import torch
def _upward_pass(x):
# stuff happens - this is the "forward" calculation
return Vt # some output value
def _downward_pass(Et):
# more stuff happens - this is the "backward" calculation
return E # some derivative of x, i.e. dVt / dx * Et, I think, ...
def _adjoint_upward_pass(dx):
# compute directional gradient
return Vdt
def _adjoint_backward_pass(E):
# compute hessian, d^2Vt / dx^2
return Ed
# the actual autograd function
class ForwardFunction(torch.autograd.Function):
@staticmethod
def forward(ctx, x):
Vt = _upwards_pass(x)
ctx.save_for_backward(x)
return Vt
@staticmethod
def backward(ctx, Et):
x = ctx.saved_tensors
E = ForwardFunctionBackward.apply(x, Et)
return E
class ForwardFunctionBackward(torch.autograd.Function):
@staticmethod
def forward(ctx, x, Et):
E = _downwards_pass(Et)
return E
@staticmethod
def backward(ctx, dx):
Vtd = _adjoint_upwards_pass(dx)
Ed = _adjoint_downwards_pass(E)
return Ed
As I put this codebase together, I’ve been testing the outputs with gradcheck
and gradgradcheck
. I got gradcheck
working, but gradgradcheck
will randomly fail, either returning all zeros in the numerical output or all zeros in the analytical output (depending on the inputs). So sometimes my function is “constant”, where no perturbations will change the output of the function.
Part of the issue that makes development slow is my struggle understanding the inputs and outputs of all of the functions. My current challenge is understanding what a “perturbation” means in the context of calculating the numerical Jacobian – are the inputs to the function “perturbed” in order to compute gradients? Specifically, should I expect some delta term to be added to dx
during the gradgradcheck
run?
Any thoughts on understanding the internals of autograd will be greatly appreciated (in addition to any insights on how to debug this sort of thing).