 # Computing vector-Jacobian and Jacobian-vector product efficiently

I need to compute both the vector-Jacobian product and the Jacobian-vector product at the same time, and then to backprop through both. I have the following code that I have tested and I believe it works correctly:

``````def vjp(f, x, v, create_graph=True):
y = f(x)
y.backward(v, create_graph=create_graph)

def jvp(f, x, v, create_graph=True):
g = lambda v: vjp(f, x, v, create_graph=True)
return vjp(g, v, v, create_graph=create_graph)

def get_loss(f, x, v):
vjp_val = vjp(f, x, v)
jvp_val = jvp(f, x, v)

return (vjp_val - jvp_val).norm(1)
``````

It is however inefficient, as it effectively computes `f(x).backward(v)` twice. Hence I would like to rewrite it in such way that it only does so once. Here is my attempt:

``````def get_loss_fast(f, x, v):
y = f(x)
y.backward(v, create_graph=True)

vjp_val.backward(v, create_graph=True)

return (vjp_val - jvp_val).norm(1)
``````

This code always returns zero. In fact, inside the `get_loss_fast` function, `vjp_val is jvp_val` is `True`, which means that the second `backward()` does not overwrite the output of the first one.

How can I compute this loss efficiently and correctly?

Hi,

First thing for such tasks, I would advise to use `autograd.grad()` instead of `.backward()` as `.backward()` might create reference cycles.

Some thing like this should work no?

``````def get_loss_fast(f, x, v):
y = f(x)

return (vjp_val - jvp_val).norm(1)
``````

note that because of the detach at the beginning, no gradient will flow back to the input x or v if you try to backward the result of that function. Is that something you want?

Thank you, this seems to return correct results. One question though: why did you put `v.detach()` in the expression for `jvp_val`? I.e., why would something like this not work correctly?

``````def get_loss_fast(f, x, v):

return (vjp_val - jvp_val).norm(1)
``````

Because of the detach at the beginning, no gradient will flow back to the input x or v if you try to backward the result of that function. Is that something you want?

Yes, it is exactly as I intend.

Just because we don’t want to create the graph for that backward, so I was making the grad_outputs not require gradients. But I guess that is not strictly necessary as we do not set `create_graph` Since I want to backprop through the resulting loss, I (think I) need to set `create_graph` in the second call to `grad()`. Otherwise, I am getting the following:

``````# loss = get_loss_fast(f, x, v)
# loss.backward()

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.
``````

where `f` is an `nn.Linear`, and `x` and `v` are two random vectors. When I specify `create_graph=True`, there is no such error.

In any case, as far as I understand, the performance penalty for either doing a `.detach()` or doing a redundant gradient computation is negligible, so there is little point in benchmarking whether this `.detach()` is useful or not.

A working version of the code for posterity:

``````def get_loss_fast(f, x, v):

return (vjp_val - jvp_val).norm(1)
``````

All the codes are giving me a result of 0

``````x=torch.FloatTensor([1,2,3])
v=torch.FloatTensor([1,1,1])
y = x**3 - 6*x
print("jvp_val is {0}".format(jvp_val))
print((vjp_val - jvp_val).norm(1))
``````

Results

``````tensor([ 6., 12., 18.], grad_fn=<CloneBackward>)
``````

Well if you want gradients, you might want to double check that.
In particular, you will need to remove the detach at the beginning and make sure they require gradients.

The following should work:

``````import torch

def f(x):
return 2 + x**4 + x ** 5

def get_loss_fast(x, v):
y = f(x)

return (vjp_val - jvp_val).norm(1)

get_loss_fast(*inp).backward()
In my particular case, I only need the gradients for `f`, and not for `x` or `v`. Thanks for pointing this out though!