# Arbitrary-floating numbers and automatic differenation

Hi,

I have a likelihood function in which if I have a data point which is, say, 68, I must then calculate 68 derivatives. I have hard-coded a decent amount of derivatives already. Because of the nature of the likelihood function and how I must use a ridiculous order of derivatives, I need an arbitrary floating point library for which I use mpmath. I’ve tried optimizing my code by dynamically switching the precision, using parallel python or gnu parallel and HPCs, etc. I’m debating on whether or not I should migrate to C++ to get the calculation even quicker. Anyway, my question, does pytorch enable the use of arbitrary floating point numbers in its automatic differentiation? I’d like to experiment with the idea of staying in python, automatic differentiation, and employing GPUs to speed up my own workflow and optimize computations. Also, does pytorch’s JIT allow for mpmath or another arbitrary floating precision module? For reference, I don’t think Numba does allow arb floating precision numbers.

Thanks,
Zach

Hi Zach!

No, pytorch’s autograd framework uses pre-evaluated analytic*
differentiation, not numerical differentiation.

If you can figure out how to write your likelihood function using
pytorch tensor operations, then autograd will differentiate your

have a `forward()` method that calculates the function itself,
and a `backward()` method that calculates the gradient. So,
when the pytorch implementers wrote `torch.sin()`, they called
some math library to compute `sin()`, they analytically differentiated
`sin()` to get `cos()`, and then they called some math library to compute
`cos()`. (Of course, the math library computes `cos()` numerically,
but the fact that `cos()` is the derivative of `sin()` was determined
analytically.)

If you can’t figure out how to write your likelihood function using
pytorch tensor functions, you will have to write your own `backward()`
function for it. (This is perfectly doable, but you’re better off using
the existing pytorch tensor functions, if you can.)

*) Nothing prevents an implementation from using numerical
differentiation in a `backward()` function – all `backward()` needs
knows nor cares how the gradient was computed. I’m just not aware
of any built-in pytorch function that uses numerical differentiation for
its `backward()`. So you could write a `backward()` function that uses
numpy or mpmath or whatever to perform its numerical differentiation,
provided that you repackage the final result you return as a pytorch
tensor.

Best.

K. Frank

Thanks for the reply! I’m just getting introduced to autodiff in general. Do you think autodiff would outperform hard-coding a function for, say, a 60th derivative which uses high-order chain rule? Because of Faa di Bruno’s formula, lower-order derivatives are used over and over again in high-order chain rule calculations. But I’m guessing pytorch has an optimized higher-order chain rule implementation? E.g. in Faa di Bruno’s formula, it seems the bottleneck is unavoidable in that so many addition and multiplication operations are applied for higher and higher-order chain rule derivatives. I suppose the use of GPU via pytorch would speed this up. But is there someone familiar with higher-order chain rule and pytorch?

Hello Zach!

I don’t really understand what your use case is.

However, you should understand that pytorch is a neural-network
framework, and not a general-purpose automatic-differentiation
package.

Pytorch’s purpose in life is to support building and training neural
networks. A neural network is trained by (partially) minimizing a
loss function with respect to the network’s parameters. For a
variety of reasons, this minimization is performed using various
framework calculates the gradient (first-derivative) of the loss
with respect to the network’s parameters.

autograd is a very useful specialized tool for pytorch’s use case,
but it is not particularly general-purpose.

There is no direct support in autograd for higher-order derivative.
There is no support in autograd for finding the derivative of a function
for whose pieces you haven’t already provided derivatives (as
`backward()` methods).

Note, to clarify the comment in my previous post about “analytic”
differentiation:

Think of a neural network as a bunch of “simple” functions that
are composed with one another – that is, chained together.
To do so, it relies on the authors of the simple functions to
provide their derivatives, packaged in the `backward()` method.
One imagines that those authors calculated those derivatives
analytically, but they didn’t have to.

autograd then implements the chain rule numerically to differentiate
the big, composite function, using the derivative of its pieces obtained
from their `backward()` methods.

If you think this through, you will see that autograd is great for pytorch’s
use case, but probably not very useful for what I think you might want
to do.

Good luck!

K. Frank