Arbitrary-floating numbers and automatic differenation

zacmon · January 26, 2020, 7:15am

Hi,

I have a likelihood function in which if I have a data point which is, say, 68, I must then calculate 68 derivatives. I have hard-coded a decent amount of derivatives already. Because of the nature of the likelihood function and how I must use a ridiculous order of derivatives, I need an arbitrary floating point library for which I use mpmath. I’ve tried optimizing my code by dynamically switching the precision, using parallel python or gnu parallel and HPCs, etc. I’m debating on whether or not I should migrate to C++ to get the calculation even quicker. Anyway, my question, does pytorch enable the use of arbitrary floating point numbers in its automatic differentiation? I’d like to experiment with the idea of staying in python, automatic differentiation, and employing GPUs to speed up my own workflow and optimize computations. Also, does pytorch’s JIT allow for mpmath or another arbitrary floating precision module? For reference, I don’t think Numba does allow arb floating precision numbers.

Thanks,
Zach

KFrank · January 26, 2020, 3:36pm

Hi Zach!

No, pytorch’s autograd framework uses pre-evaluated analytic*
differentiation, not numerical differentiation.

If you can figure out how to write your likelihood function using
pytorch tensor operations, then autograd will differentiate your
function without additional work on your part.

autograd uses subclasses of torch.autograd.Function. These
have a forward() method that calculates the function itself,
and a backward() method that calculates the gradient. So,
when the pytorch implementers wrote torch.sin(), they called
some math library to compute sin(), they analytically differentiated
sin() to get cos(), and then they called some math library to compute
cos(). (Of course, the math library computes cos() numerically,
but the fact that cos() is the derivative of sin() was determined
analytically.)

If you can’t figure out how to write your likelihood function using
pytorch tensor functions, you will have to write your own backward()
function for it. (This is perfectly doable, but you’re better off using
the existing pytorch tensor functions, if you can.)

*) Nothing prevents an implementation from using numerical
differentiation in a backward() function – all backward() needs
to do is return the correct gradient. The autograd framework neither
knows nor cares how the gradient was computed. I’m just not aware
of any built-in pytorch function that uses numerical differentiation for
its backward(). So you could write a backward() function that uses
numpy or mpmath or whatever to perform its numerical differentiation,
provided that you repackage the final result you return as a pytorch
tensor.

(Note, pytorch offers torch.autograd.gradcheck() for checking
gradients numerically.)

Best.

K. Frank

zacmon · January 27, 2020, 6:46am

Thanks for the reply! I’m just getting introduced to autodiff in general. Do you think autodiff would outperform hard-coding a function for, say, a 60th derivative which uses high-order chain rule? Because of Faa di Bruno’s formula, lower-order derivatives are used over and over again in high-order chain rule calculations. But I’m guessing pytorch has an optimized higher-order chain rule implementation? E.g. in Faa di Bruno’s formula, it seems the bottleneck is unavoidable in that so many addition and multiplication operations are applied for higher and higher-order chain rule derivatives. I suppose the use of GPU via pytorch would speed this up. But is there someone familiar with higher-order chain rule and pytorch?

KFrank · January 27, 2020, 3:58pm

Hello Zach!

I don’t really understand what your use case is.

However, you should understand that pytorch is a neural-network
framework, and not a general-purpose automatic-differentiation
package.

Pytorch’s purpose in life is to support building and training neural
networks. A neural network is trained by (partially) minimizing a
loss function with respect to the network’s parameters. For a
variety of reasons, this minimization is performed using various
gradient-descent algorithms. To this end pytorch’s autograd
framework calculates the gradient (first-derivative) of the loss
with respect to the network’s parameters.

autograd is a very useful specialized tool for pytorch’s use case,
but it is not particularly general-purpose.

There is no direct support in autograd for higher-order derivative.
There is no support in autograd for finding the derivative of a function
for whose pieces you haven’t already provided derivatives (as
backward() methods).

Note, to clarify the comment in my previous post about “analytic”
differentiation:

Think of a neural network as a bunch of “simple” functions that
are composed with one another – that is, chained together.
autograd differentiates (finds the gradient of) such a function.
To do so, it relies on the authors of the simple functions to
provide their derivatives, packaged in the backward() method.
One imagines that those authors calculated those derivatives
analytically, but they didn’t have to.

autograd then implements the chain rule numerically to differentiate
the big, composite function, using the derivative of its pieces obtained
from their backward() methods.

If you think this through, you will see that autograd is great for pytorch’s
use case, but probably not very useful for what I think you might want
to do.

Good luck!

K. Frank