When I try to get a Hessian of a net using nn.Mish on a GPU I get Nans. I see that exp() is used in the C++ code, which could be the reason. Is being able to get the second derivative of various internally implemented functions something expected or not?
You can manually implement a custom function within the python API via
torch.custom.autograd (with a second derivative as well)
Yes, that’s what I did. But I’m wondering about the general philosophy regarding second derivatives of various internal functions since some of them are faster and more thoroughly examined than custom Python implementations.