Incorrect result produced by grad function for torch.log(x.grad) compared to jacobian and hessian functions

YingHH1 · November 18, 2022, 4:07am

Hi, I have been trying to get the correct hessian vector product result using the grad function but with no luck. I have tried versions 1.11, 1.12, 1.13 and all have behaved in the same way. I have created a simple example to illustrate this:

import torch
torch.manual_seed(1234)
torch.set_default_dtype(torch.float64)
# device='cuda:0'
device='cpu'

x=torch.tensor([102.3,-121.3,1.2,-0.61,2.5e-5,-1.65e-7], requires_grad=True,device=device)
grad_outputs=torch.ones_like(x,device=device)*torch.tensor([1.0],device=device)
A=torch.rand((6,6),device=device)

def simpleFunc_H(input):
    output=torch.square(torch.matmul(A,torch.tanh(input))).sum()
    return output

def simpleFunc_J(input):
    y=simpleFunc_H(input)
    y.backward(create_graph=True)
    return torch.log(torch.abs(input.grad)+1.0e-24)

y=simpleFunc_H(x)
G=simpleFunc_J(x)

Hz=torch.autograd.grad(x.grad, x, grad_outputs=grad_outputs, retain_graph=True)
logHz=torch.autograd.grad(G, x, grad_outputs=grad_outputs, retain_graph=True)

print("Hz=\n",Hz[0])
print("logHz*x.grad=\n",logHz[0]*x.grad)
print("|logHz*x.grad-Hz|=\n",torch.abs(logHz[0]*x.grad-Hz[0]).sum())

H_=torch.autograd.functional.hessian(simpleFunc_H, x)
print("H_=\n",torch.matmul(H_,grad_outputs))

J_=torch.autograd.functional.jacobian(simpleFunc_J, x)
print("J_*x.grad=\n",torch.matmul(J_,grad_outputs)*x.grad)

Hz=
tensor([0.0000, 0.0000, 1.0938, 6.2482, 8.9529, 7.0273])
logHzx.grad=
tensor([ 0.0000, 0.0000, 0.1845, 4.5437, 9.4831, 15.6093],
grad_fn=)
|logHzx.grad-Hz|=
tensor(11.7260, grad_fn=)
H_=
tensor([0.0000, 0.0000, 1.0938, 6.2482, 8.9529, 7.0273])
J_*x.grad=
tensor([0.0000, 0.0000, 1.0938, 6.2482, 8.9529, 7.0273],
grad_fn=)

The output shows the results for Hessian * vectors of 1, produced by grad with d/dx(log(x.grad))*x.grad is different compared to the jacobian implementation. as shown above. However, if I remove the torch.square as in

def simpleFunc_H(input):
    output=(torch.matmul(A,torch.tanh(input))).sum()
    return output

I get the same results across the board

Hz=
tensor([-0.0000e+00, 0.0000e+00, -1.3045e+00, 2.0985e+00, -1.5511e-04,
7.6084e-07])
logHzx.grad=
tensor([-0.0000e+00, 0.0000e+00, -1.3045e+00, 2.0985e+00, -1.5511e-04,
7.6084e-07], grad_fn=)
|logHzx.grad-Hz|=
tensor(0., grad_fn=)
H_=
tensor([ 0.0000e+00, 0.0000e+00, -1.3045e+00, 2.0985e+00, -1.5511e-04,
7.6084e-07])
J_*x.grad=
tensor([ 0.0000e+00, 0.0000e+00, -1.3045e+00, 2.0985e+00, -1.5511e-04,
7.6084e-07], grad_fn=)

I couldn’t figure out why this is the case. Maybe change of log space somehow causes numerical flow issues during intermediate steps? But the second example is fine which contradicts with this speculation.

I appreciate if anyone can help on this.

Many thanks