Hi, I have noticed that the following function yields different values for hessian/2nd derivative (0.0
vs nan
) depending on whether it was jitted or not. Is this intended behavior? I have replicated this behavior in torch 1.12, 1.13, and 2.1.0
In [2]: torch.__version__
Out[2]: '2.1.0.dev20230309+cpu'
In [3]: from torch.autograd.functional import hessian
In [4]: def ssp(x):
...: return torch.nn.functional.softplus(x) - 0.69314
...:
In [5]: x = torch.tensor(120.0)
In [6]: hessian(ssp, x)
Out[6]: tensor(0.)
In [7]: hessian(ssp, x)
Out[7]: tensor(0.)
In [8]: ssp = torch.jit.script(ssp)
In [9]: hessian(ssp, x)
Out[9]: tensor(0.)
In [10]: hessian(ssp, x)
Out[10]: tensor(nan)
Note: In some cases the second call to hessian yield nan
but in most cases the jitted function always give nan
on second derivative (specially it jitted using decorator @torch.jit.script
)