Is there any method (or what is the easiest way) to compute the matrix divergence.

The matrix divergence is given by the following (Divergence - Wikipedia, and a screenshot is also attached)

Is there any method (or what is the easiest way) to compute the matrix divergence.

The matrix divergence is given by the following (Divergence - Wikipedia, and a screenshot is also attached)

Hi Hao!

Yes. The short story is that you can use `func.jacfwd()`

to

compute the full jacobian of `A`

with respect to `x`

and then pick

out the specific partial derivatives that you need for the matrix

divergence.

First, a couple of comments:

Autograd performs differentiation with respect to entire tensors,

not just specific elements of tensors. So you can’t differentiate

`A[1, 1]`

with respect to `x[1]`

without also differentiating it with

respect to `x[2]`

. (Analogously, in forward-mode autograd, you

can’t differentiate `A[1, 1]`

with respect to `x[1]`

without also

differentiating` A[1, 2]`

with respect to `x[1]`

.) So there is no

practical way to compute just the specific individual partial

derivatives that you need.

When you use backward-mode autograd (e.g., `.backward()`

),

autograd computes the gradient of a single scalar with respect

to all of the elements of potentially multiple tensors. When you

use forward-mode autograd, it computes the directional

derivatives of all of the elements of potentially multiple tensors

with respect to a single direction within the tensor(s) with respect

to which you are differentiating.

To compute a jacobian, you need the partial derivatives of all of

the elements of your output tensor(s) with respect to all of the

elements of your input tensor(s). So, regardless of whether you

are using backward- or forward-mode autograd, you will need

multiple autograd passes, one for each output element you are

differentiating, or one for each input element with respect to

which you are differentiating.

In backward mode, you would need to perform nine autograd

passes (one for each of the nine elements of `A`

). In forward

mode you would need three autograd passes (one for each of

the three elements of `x`

). So it is likely to be cheaper to use

forward mode than backward mode for this use case.

You don’t have to run the multiple autograd passes by hand

yourself – pytorch packages the full jacobian computation

in `func.jacrev()`

(backward mode) and `func.jacfwd()`

(forward mode).

Here is a script that computes your matrix divergence using

`jacfwd()`

(because that is likely to be more efficient):

```
import torch
print (torch.__version__)
def A_of_x (x): # some example function that mixes the elements of x together
xsq = (x**2).sum()
return torch.outer (xsq * x, x)
jacA = torch.func.jacfwd (A_of_x) # jacA is a functional that computes jacobian at a specific point
x = torch.arange (3.) + 1 # some point at which to compute jacobian
print (x)
jac = jacA (x) # computes full jacobian, not just desired elements
print (jac)
div = torch.einsum ('ijj -> i', jac) # use einsum to pick out the desired elements and sum
print (div)
# to see what's going on, compute div by picking out elements and summing by hand
div_mask = torch.eye (3).unsqueeze (1).expand (3, 3, 3)
print (div_mask)
print (div_mask * jac)
print ((div_mask * jac).sum ((0, 2)))
```

And here is its output:

```
2.4.1
tensor([1., 2., 3.])
tensor([[[ 30., 4., 6.],
[ 32., 22., 12.],
[ 48., 12., 32.]],
[[ 32., 22., 12.],
[ 8., 72., 24.],
[ 12., 66., 64.]],
[[ 48., 12., 32.],
[ 12., 66., 64.],
[ 18., 36., 138.]]])
tensor([ 84., 168., 252.])
tensor([[[1., 0., 0.],
[1., 0., 0.],
[1., 0., 0.]],
[[0., 1., 0.],
[0., 1., 0.],
[0., 1., 0.]],
[[0., 0., 1.],
[0., 0., 1.],
[0., 0., 1.]]])
tensor([[[ 30., 0., 0.],
[ 32., 0., 0.],
[ 48., 0., 0.]],
[[ 0., 22., 0.],
[ 0., 72., 0.],
[ 0., 66., 0.]],
[[ 0., 0., 32.],
[ 0., 0., 64.],
[ 0., 0., 138.]]])
tensor([ 84., 168., 252.])
```

Best.

K. Frank