# Sum of squared 2nd-order derivatives

Hi, I have been stuck on a problem, and I’m wondering if there is a solution. Thank you in advance!

I have two embedding vectors, `u` and `v` both of size `m`, that are inputs to a torch `model`, and I would like to efficiently compute a form of the model’s summed-hessian between the two vectors. I found that the simple approach is efficient enough for me

``````x = torch.stack([u,v]).view(1,2,m) #batchsize of 1, 2 vectors u v, each with size m
y = model(x) #y is a scalar

Note the `.sum(1)` in each `autograd.grad` line.
Under the hood, autograd computes `grad_uv` to be the sum of derivatives: `(d^2 y)/(du_1dv_1) + (d^2 y)/(du_1dv_2) + (d^2 y)/(du_2dv_1) + (d^2 y)/(du_2dv_2) ` if the sizes of `u` and `v` are just `m=2`.
However, what I would really like is this expression with the sum of squares: `((d^2 y)/(du_1dv_1))**2 + ((d^2 y)/(du_1dv_2))**2 + ((d^2 y)/(du_2dv_1))**2 + ((d^2 y)/(du_2dv_2))**2 `
It seems that chain rule cant give me this. Is it possible to compute this expression without explicitly computing the second `autograd.grad` line over each element of vector `dy/du`? I basically want to avoid the extra `for` loop as is common in higher-order grad computations. I was thinking that the vector-jacobian product within `grad()` could help, but would I need modify `autograd` source code?