Computing the Hessian matrix of a function w.r.t a tensor

Hi everyone, I really need some help with the following problem.

I have a list of embeddings [e_1, e_2… e_k].
Each item e_i in the list is a monodimensional Tensor with size, say, 200.

I have successfully trained my embeddings in the following setting:

  • I have a scoring function that takes as an input 3 embeddings: φ(e_i, e_j, e_k)
  • I have a training set that is a set of triples (item_i, item_j, item_k), where item_i, item_j and item_k correspond to the embeddings e_i, e_j, e_k
  • In training, I initialize my embeddings randomly and then train them to minimize the values of φ for the triples in my training set.

I have completed the training and it worked fine, so that part is ok (yay!).

What I am now trying to do is to compute the Hessian matrix for a specific embedding in the list.
To be more specific, given a specific embedding e_x:

  • I can isolate all triples featuring item_x from the training set, let’s call them x_triples
  • I define f = sum([phi(a, b, c) for (a, b, c) in x_triples])
  • I would like to compute the Hessian matrix of f with respect to e_x

I said that e_x is a monodimensional vector of size 200, so it is like
e_x = [x1, x2, x3 … x200]

The Hessian matrix should be a 200x200 matrix that, in row i and column j, contains the second order partial derivative of f with respect to x_i and x_j.
In other words, something like this (the image is shamelessly taken from wikipedia but it should make things a bit clearer hopefully).

image

This leads me to the actual question.

I see in this page that there is a “torch.autograd.functional.hessian” function that may be the way to go for my scenario.
https://pytorch.org/docs/stable/autograd.html

I see from the documentation that its main parameters are:

  • func (function) – a Python function that takes Tensor inputs and returns a Tensor with a single element.
  • inputs (tuple of Tensors or Tensor) – inputs to the function func.

I would say that in my case :

  • func is be the sum of all phi(a, b, c) for any (a, b, c) in x_triples
  • inputs is e_x

My problem is that all embeddings e1, e2… ek are used by func, but I only want the Hessian to be computed respect to e_x (that is, its 200 components).
Is there a way to do that?

(Sorry for the long post)

Do you some code to reproduce the issue?

  • Also, how exactly does e_x relate to x_triples?
  • I assume you want to compute the hessian for each embedding vector e_i? So, you’d have an output of k hessian matrices?
  • Also, are the embeddings e_i, e_j, and e_k three random embeddings from the list of embeddings stated at the beginning or three components of the an embedding e_i?

Hi, @AlphaBetaGamma96, thanks for replying!
I don’t have any code to reproduce the issue, because I’m still trying to figure out how to write the hessian part :stuck_out_tongue:

I think my previous post was a bit confusing, sorry about that. I’ll try to make it more understandable.

Let’s say I’m computing embeddings of words, and my input is made entirely of trigrams (triples of words).

I have a function φ that can compute the likelihood of a trigram using the embeddings of the corresponding words.

What I am trying to do is to compute the Hessian matrix of the likelihood of the trigrams that contain a certain word, with respect to the embedding of that word.

So item_x is the word I want to get the Hessian with respect to; e_x is its trained embedding; x_triples is the list of trigrams that feature the word item_x.
I hope this clarifies your first question, so I move to the others.

Second question: no, actually I just need it to run on a specific word at a time, so for one embedding.
It is for an explainability project, so I only need it for the specific embedding I am trying to explain (I won’t try to explain them all).

Third question: in mu example e_i, e_j, and e_k were meant to be three separate embeddings.
Each of them was supposed to be monodimensional and with size 200.

NLP tasks are a little out of my skill set, but what I’d suggest you try is figure out how exactly the embedding of given word is link to the likelihood of the trigrams, and build a function which takes the embeddings and returns the likelihood. Then you can just use the torch.autograd.functional.hessian to calculate it!