Hi everyone, I really need some help with the following problem.
I have a list of embeddings [e_1, e_2… e_k].
Each item e_i in the list is a monodimensional Tensor with size, say, 200.
I have successfully trained my embeddings in the following setting:
- I have a scoring function that takes as an input 3 embeddings: φ(e_i, e_j, e_k)
- I have a training set that is a set of triples (item_i, item_j, item_k), where item_i, item_j and item_k correspond to the embeddings e_i, e_j, e_k
- In training, I initialize my embeddings randomly and then train them to minimize the values of φ for the triples in my training set.
I have completed the training and it worked fine, so that part is ok (yay!).
What I am now trying to do is to compute the Hessian matrix for a specific embedding in the list.
To be more specific, given a specific embedding e_x:
- I can isolate all triples featuring item_x from the training set, let’s call them x_triples
- I define f = sum([phi(a, b, c) for (a, b, c) in x_triples])
- I would like to compute the Hessian matrix of f with respect to e_x
I said that e_x is a monodimensional vector of size 200, so it is like
e_x = [x1, x2, x3 … x200]
The Hessian matrix should be a 200x200 matrix that, in row i and column j, contains the second order partial derivative of f with respect to x_i and x_j.
In other words, something like this (the image is shamelessly taken from wikipedia but it should make things a bit clearer hopefully).
This leads me to the actual question.
I see in this page that there is a “torch.autograd.functional.hessian” function that may be the way to go for my scenario.
https://pytorch.org/docs/stable/autograd.html
I see from the documentation that its main parameters are:
- func (function) – a Python function that takes Tensor inputs and returns a Tensor with a single element.
-
inputs (tuple of Tensors or Tensor) – inputs to the function
func
.
I would say that in my case :
- func is be the sum of all phi(a, b, c) for any (a, b, c) in x_triples
- inputs is e_x
My problem is that all embeddings e1, e2… ek are used by func, but I only want the Hessian to be computed respect to e_x (that is, its 200 components).
Is there a way to do that?
(Sorry for the long post)