I have a function f(x) which includes an approximation with Taylor expansion, e.g. f(x,a)=g(a)+g’(a)(x-a) where g’(x)=dg(x)/dx is computed with torch.autograd. Now I would like to differentiate it in “a”, i.e. df(x,a)/da. To simplify even more the function let’s say that f(a)=g’(a):

def g(x):
return x

def f(a):
return torch.autograd.functional.jacobian(g, a)

If I visualize the computational graph, it seems that I lose the gradient when computing the Jacobian.
In particular, using these lines I get an empty graph:

a = torch.ones(1, requires_grad=True)
torchviz.make_dot(f(a), params=dict(a=a))

Now, my question is: how can I implement this computation?

If I understand your use case, I would use autograd.grad() twice, with create_graph = True for the first call so that autograd will be able to
differentiate the derivative in the second call.

Here is an example script where g is assumed to be a scalar function:

import torch
print (torch.__version__)
def g (x):
return x**3
def f (x, a):
ga = g (a)
gpa = torch.autograd.grad (ga, a, create_graph = True)[0]
return ga + gpa * (x - a)
x = torch.tensor ([2.1])
a = torch.tensor ([2.0], requires_grad = True)
fxa = f (x, a)
print ('fxa =', fxa)
fpa = torch.autograd.grad (fxa, a)[0]
print ('fpa =', fpa)