Is there any method (or what is the easiest way) to compute the matrix divergence.
The matrix divergence is given by the following (Divergence - Wikipedia, and a screenshot is also attached)
Is there any method (or what is the easiest way) to compute the matrix divergence.
The matrix divergence is given by the following (Divergence - Wikipedia, and a screenshot is also attached)
Hi Hao!
Yes. The short story is that you can use func.jacfwd()
to
compute the full jacobian of A
with respect to x
and then pick
out the specific partial derivatives that you need for the matrix
divergence.
First, a couple of comments:
Autograd performs differentiation with respect to entire tensors,
not just specific elements of tensors. So you can’t differentiate
A[1, 1]
with respect to x[1]
without also differentiating it with
respect to x[2]
. (Analogously, in forward-mode autograd, you
can’t differentiate A[1, 1]
with respect to x[1]
without also
differentiating A[1, 2]
with respect to x[1]
.) So there is no
practical way to compute just the specific individual partial
derivatives that you need.
When you use backward-mode autograd (e.g., .backward()
),
autograd computes the gradient of a single scalar with respect
to all of the elements of potentially multiple tensors. When you
use forward-mode autograd, it computes the directional
derivatives of all of the elements of potentially multiple tensors
with respect to a single direction within the tensor(s) with respect
to which you are differentiating.
To compute a jacobian, you need the partial derivatives of all of
the elements of your output tensor(s) with respect to all of the
elements of your input tensor(s). So, regardless of whether you
are using backward- or forward-mode autograd, you will need
multiple autograd passes, one for each output element you are
differentiating, or one for each input element with respect to
which you are differentiating.
In backward mode, you would need to perform nine autograd
passes (one for each of the nine elements of A
). In forward
mode you would need three autograd passes (one for each of
the three elements of x
). So it is likely to be cheaper to use
forward mode than backward mode for this use case.
You don’t have to run the multiple autograd passes by hand
yourself – pytorch packages the full jacobian computation
in func.jacrev()
(backward mode) and func.jacfwd()
(forward mode).
Here is a script that computes your matrix divergence using
jacfwd()
(because that is likely to be more efficient):
import torch
print (torch.__version__)
def A_of_x (x): # some example function that mixes the elements of x together
xsq = (x**2).sum()
return torch.outer (xsq * x, x)
jacA = torch.func.jacfwd (A_of_x) # jacA is a functional that computes jacobian at a specific point
x = torch.arange (3.) + 1 # some point at which to compute jacobian
print (x)
jac = jacA (x) # computes full jacobian, not just desired elements
print (jac)
div = torch.einsum ('ijj -> i', jac) # use einsum to pick out the desired elements and sum
print (div)
# to see what's going on, compute div by picking out elements and summing by hand
div_mask = torch.eye (3).unsqueeze (1).expand (3, 3, 3)
print (div_mask)
print (div_mask * jac)
print ((div_mask * jac).sum ((0, 2)))
And here is its output:
2.4.1
tensor([1., 2., 3.])
tensor([[[ 30., 4., 6.],
[ 32., 22., 12.],
[ 48., 12., 32.]],
[[ 32., 22., 12.],
[ 8., 72., 24.],
[ 12., 66., 64.]],
[[ 48., 12., 32.],
[ 12., 66., 64.],
[ 18., 36., 138.]]])
tensor([ 84., 168., 252.])
tensor([[[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.]],
[[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.]],
[[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]]])
tensor([[[ 30., 0., 0.],
[ 32., 0., 0.],
[ 48., 0., 0.]],
[[ 0., 22., 0.],
[ 0., 72., 0.],
[ 0., 66., 0.]],
[[ 0., 0., 32.],
[ 0., 0., 64.],
[ 0., 0., 138.]]])
tensor([ 84., 168., 252.])
Best.
K. Frank