Hi, I have been stuck on a problem, and I’m wondering if there is a solution. Thank you in advance!

I have two embedding vectors, `u`

and `v`

both of size `m`

, that are inputs to a torch `model`

, and I would like to efficiently compute a form of the model’s summed-hessian *between* the two vectors. I found that the simple approach is efficient enough for me

```
x = torch.stack([u,v]).view(1,2,m) #batchsize of 1, 2 vectors u v, each with size m
y = model(x) #y is a scalar
grads = (autograd.grad(y, x, create_graph=True)[0].squeeze()).sum(1) #sum across emb vectors (not between them)
grad_u = grads[0] #a scalar
grads = (autograd.grad(grad_u, x, retain_graph = True)[0].squeeze()).sum(1) #sum across emb vectors
grad_uv = grads[1] #a scalar
```

Note the `.sum(1)`

in each `autograd.grad`

line.

Under the hood, autograd computes `grad_uv`

to be the sum of derivatives: `(d^2 y)/(du_1dv_1) + (d^2 y)/(du_1dv_2) + (d^2 y)/(du_2dv_1) + (d^2 y)/(du_2dv_2) `

if the sizes of `u`

and `v`

are just `m=2`

.

However, what I would really like is this expression with the sum of squares: `((d^2 y)/(du_1dv_1))**2 + ((d^2 y)/(du_1dv_2))**2 + ((d^2 y)/(du_2dv_1))**2 + ((d^2 y)/(du_2dv_2))**2 `

It seems that chain rule cant give me this. Is it possible to compute this expression without explicitly computing the second `autograd.grad`

line over *each* element of vector `dy/du`

? I basically want to avoid the extra `for`

loop as is common in higher-order grad computations. I was thinking that the vector-jacobian product within `grad()`

could help, but would I need modify `autograd`

source code?

Thank you so much!