Get all zero answer while calculating jacobian in PyTorch using build-in function-jacobian

miraboreasu · August 4, 2022, 1:18pm

I am trying to compute Jacobian matrix, it is computed between two vectors, and the result should be a matrix. Ref:

import torch
from torch.autograd.functional import jacobian
from torch import tensor

def get_f (x):      
    f=torch.arange(0,3, requires_grad=True, dtype=torch.float64)
    for i in looparray:
        with torch.no_grad():   
            f[i] = x[i]**2    
    return f
    
looparray=torch.arange(0,3)
x=torch.arange(0,3, requires_grad=True, dtype=torch.float64)
J = jacobian(get_f, x).detach().numpy()

Mathematically, Jacobian is computed by

df[0]/dx[0]   df[0]/dx[1]  df[0]/dx[2]
df[1]/dx[0]   df[1]/dx[1]  df[1]/dx[2]
df[2]/dx[0]   df[2]/dx[1]  df[2]/dx[2]

It should be

0 0 0
0 2 0
0 0 4

But the result from the code is

0 0 0
0 0 0
0 0 0

And I must update in-place in the get_f, like f maybe becomes

 f[0] = x[0]**2
 f[1] = x[0]**2+x[1]**2
 f[2] = x[1]**2+x[2]**2

So a for loop (here 3x3 is just an example, it could be way larger) or in-place updates is very necessary in my case

AlphaBetaGamma96 · August 4, 2022, 2:08pm

Remove this line. It’s resulting in no gradients flowing through your function get_f

miraboreasu · August 4, 2022, 2:51pm

Hey, no it won’t, it just track this indexing operation. And if I remove it, error like this pop yp
RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

AlphaBetaGamma96 · August 4, 2022, 2:54pm

It’s best not to define arrays and update in-place within the function, the following code works,

import torch
from torch.autograd.functional import jacobian
from torch import tensor

def get_f(x):
  return x**2

x=torch.arange(0,3, requires_grad=True, dtype=torch.float64)
J = jacobian(get_f, x).detach().numpy()
print(J)

miraboreasu · August 4, 2022, 2:57pm

Thank you, but I must update in-place, any idea?

AlphaBetaGamma96 · August 4, 2022, 3:04pm

Autograd doesn’t work with in-place updates. How would you define the gradient of placing a scalar within an array?

miraboreasu · August 4, 2022, 3:40pm

Thank you, but are you sure that no way to work with in-place updates? I remember I did one before, but if I want to have some complicate f, different updates for different indexes, how can I do it

AlphaBetaGamma96 · August 4, 2022, 4:16pm

Why do you need to do it via in-place updates? Why can’t you just return a Tensor that the network creates itself? Do you have an example to share?

miraboreasu · August 4, 2022, 7:42pm

Like if f becomes

 f[0] = x[0]**2
 f[1] = x[0]**2+x[1]**2
 f[2] = x[1]**2+x[2]**2

AlphaBetaGamma96 · August 4, 2022, 7:53pm

why don’t you just define a function like,

def f(x,y,z):
  return x**2, x**2+y**2, y**2+z**2

(or the equivalent index version)?

miraboreasu · August 4, 2022, 8:13pm

Thank you, but here is just an example 3 by 3. Actually, in my case, I may have hundreds of variables, and that is why I use for loop.

AlphaBetaGamma96 · August 4, 2022, 8:19pm

If you’re doing it for multiple variables you could you could forgo pre-defining f like above

and use torch.stack and list comprehension to get AD to work. So, for example,

x = torch.arange(1,5,dtype=torch.float32, requires_grad=True)
out = torch.stack([ x[i]**2 for i in range(x.shape[0])], 0)
print(out.grad_fn) #returns <StackBackward0 object at 0x7f3415b34610>

miraboreasu · August 4, 2022, 8:24pm

I am sorry but I didn’t get it. It is multi variables, and f is also multiple

AlphaBetaGamma96 · August 4, 2022, 8:54pm

One thing I don’t understand is why the updates need to be in-place? With your example above,

you can try something like this,

import torch
from torch.autograd.functional import jacobian
from torch import tensor
"""
#old function with f pre-allocated
def get_f (x):      
  f=torch.arange(0,3, requires_grad=True, dtype=torch.float64)
  for i in looparray:
    f[i] = x[i]**2    
  return f
"""
#new func
def get_f(x):
  f0 = x[0]**2
  f1 = x[0]**2+x[1]**2
  f2 = x[1]**2+x[2]**2
  return torch.stack([f0, f1, f2], dim=0)

x=torch.arange(0,3, requires_grad=True, dtype=torch.float64)
J = jacobian(get_f, x).detach().numpy()

print(J) #returns 
#[[0. 0. 0.]
# [0. 2. 0.]
# [0. 2. 4.]]

When you calculate gradients via Automatic Differentiation (AD) you need to construct the output because AD uses the output Tensor to calculate the gradients which is why I don’t think you can use in-place updates. When you have a constant Tensor I don’t think AD can track what the derivative would be. But it’d be best to get a dev to confirm the exact details.

miraboreasu · August 4, 2022, 9:07pm

Now I get your idea, but I still have problem. the reason for the in-place update is that I may have many f, I cannot do f1=, f2=, f3= to f100=, also, I need += I send you the exact example via message, since it is too long for here.

That is why I use for (in-place or index) to construct f

AlphaBetaGamma96 · August 4, 2022, 9:23pm

If it’s too long to implement directly you could calculate each value and append each value into a list then stack into a single Tensor (like the example above),. Also, make sure to avoid use of += or *= as these are in-place operations (and are not support by AD, you need to use the out-of-place equivalent) and don’t use torch.no_grad() as that stops AD tracking gradients.

For reference, the out-of-place is,

x *= 5 #in-place
x  = x * 5 #out-of-place

values = []
for i in range(3):
  val = calc_value(i) #your function for a given index
  values.append(val) #append to list
values = torch.stack(values) #converts to Tensor, do NOT use torch.tensor()

Make sure to use torch.stack instead of torch.tensor as torch.tensor will create a new Tensor and free the computational graph for your values which will give no gradient.

miraboreasu · August 4, 2022, 9:32pm

I see, instead of given a whole f, “push back” each element to f

AlphaBetaGamma96 · August 4, 2022, 9:42pm

But don’t pre-allocate the memory for f, you’ll need to create an empty list each time in order to get derivatives.

miraboreasu · August 4, 2022, 9:44pm

I just found that in my case, the f sometimes is not computed in order, some data could be added and revisit to the previous locations. What I can do with it

AlphaBetaGamma96 · August 4, 2022, 10:01pm

I don’t know why it’s giving different permutations of the output but that’s something you can try and solve with permutations or make your function deterministic instead. Good luck!