In the grad_fn ,I find a next_functions ,But I don't understand the meaning of the attribute

a=Variable(torch.randn(1),requires_grad=True)
p=a*a
p_tmp = p.expand_as(p)
grad_acc = p_tmp.grad_fn.next_functions[0][0]
                 

I dont know the meaning of ‘next_functions’,and I couldn’t find the define of it.

4 Likes

First, don’t use Variable anymore! Be modern!

Regarding your question, the next_functions will allow you to traverse the recorded calculation graph (“backward graph”).
The backward graph will end in AccumulateGrad nodes for the leaves (they have a .variable attribute pointing to the leaf tensor) - and yours does pretty quickly as you only have one operation. Let’s have a slightly more elaborate one:

a = torch.randn(1, requires_grad=True)
b = a*(a+2)
print (b.grad_fn.next_functions)
print (b.grad_fn.next_functions[1][0].next_functions)
print (b.grad_fn.next_functions[0][0].variable is a)

gives

((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (<AddBackward0 object at 0x7fbe7aa96748>, 0))
((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (None, 0))
True

So in ‘x*(x+2)’ you have one branch for x and one for x+2 and the latter has a x branch and an uninteresting 2 branch.
Except at the leaves, you cannot, in general access the variables of the calculation from the graph.

Best regards

Thomas

10 Likes

@tom thanks very much ! I remember that and be modern. :grin::grin:

Thanks for your explanation. Can you explain what the second element in each of these inner tuples stands for? They are all 0 and I cannot create an example causing a different value. Can they be e.g. 1?

2 Likes

The number is the input number to the next backward function, so can only be non-zero when a function has multiple differentiable outputs (there aren’t that many, but e.g. the RNN functions typically do).
A minimal example that doesn’t serve much purpose except showing you a 1 is:

a, b = torch.randn(2, requires_grad=True).unbind()
c = a+b
print(c.grad_fn.next_functions)

Unbind is not that well-known, it is the “opposite” of stack, splitting a tensor along one (the first by default) dimension into a list.

Best regards

Thomas

2 Likes

Thanks, again. Now I understand.

Sorry for bothering you again. Off topic, but probably not worth a new thread: Do you have an explanation for the following behaviour regarding unbind()?

a = torch.randn(2, requires_grad=True)
a0, a1 = a.unbind()
# a0, a1 = a  # but this works
a0.backward()

Causes this error:

RuntimeError: Expected a Tensor of type Variable but found an undefined Tensor at position #1 for iterable argument #0 'tensors'

Best regards,
Saluto

fixed in master:

Best regards

Thomas

Really great, thanks!

Hi Tom,

what about the tuple (None, 0)?


it seems to be the backprop function for the constant 2.

Indeed. Internally, PyTorch loves to connect all relevant inputs to the graph, and if those don’t require gradients, you get None.

1 Like

Thanks for the explanation, I’m actually running some test_autograd.py unit test. And there is typically a gradgradcheck for a test function. Some function failed gradgradcheck, although they passed gradcheck. So as I’m using VScode python debugger to debug this case. I really need to understand how some function, e.g. addcdiv do the backward pass. Now I see how we can use grad.fn.next_functions to recover a lot of information. And also from the derivatives.yaml file, the backward calculation formula for all the functions are already there. But my question is how can we mannully get the gradients for the inputs if we only have output, or output.grad_fn.
As you mentioned, for example, y=x*2, we can backtrace to x from AccummulateGrad object’s variable attribute, but dy/dx = 2, this is the gradient, where is this information stored?

To the best of my knowledge, not all “backward-functions” are exposed. In particular those in FunctionsManual.cpp, the backward computations which typically are composed themselves of PyTorch functions, are not themselves exported.
(At least that was the state last I looked. Personally, I think that they should be available programmatically, but it didn’t catch on.)

Best regards

Thomas

1 Like

Thanks, in the case of addcdiv, when I do gradgradcheck, that is to calculate the second order of derivatives. I just successfully manually used grad_fn(torch.ones(1, device=‘cuda:0’)) to get the grad to the inputs of this grad_fn. And by looking at the next_functions, and then pass the grad of the inputs from last step as the input to this next_functions’ input, I can manually get the gradients to the leafs eventually. However, for some of the scalars, I have to do a manual tensor.sum() to get the scalar gradients in the end. So I think the grad_fn class does contain the information including what shapes are accepted, what the constants to mul/div are.(e.g. in y = 2 * x, the 2 is a constant)
So my manual calculations using grad_fn does coninside with what should be. But on my machine, the gradgrad results for addcdiv is not all correct. So I’m stuck at how to debug this. In the addcdiv example, the backward-functions are all very simple, like SumBackward, MulBackward, NegBackward, DivBackward, AccumulateGrad. And I already got the graph of how they are chained together. But I don’t know where can I set a breakpoint in the c++/cuda source code in order to debug this. For example, if DivBackward for cuda tensor has some bug, where should I put a breakpoint for this? Much appreciated.

Right, calling the grad_fn works these days.
So there are three parts:

  1. part of the interface is generated at build-time in torch/csrc/autograd/generated . These include the code for the autograd Node classes like DivBackward.
  2. the generated wrapping code ultimately calls the function specified in tools/autograd/derivatives.yaml, which you can find in the source code (either in FunctionsManual.cpp or in aten/src/ATen/native/native_functions.yaml and implemented in ATen).
  3. the graph recording for the double backward is done by treating the PyTorch calls in the backward as regular PyTorch function calls and they again generate autograd nodes, so you are again in the situation of 1./2.

If you track down the functions to step 2 of the double backward, you could set breakpoints there.

Best regards

Thomas

By the way: Do you have a (smallish) reproducing example of the failing gradgradcheck? It is very likely that people would be keen to help debug it if they can reproduce the error.

1 Like

I have a question about addition operation between a scaler and a tensor. As follows, it crashes with shape mismatched error. It seems like y (a scaler) is implicitly expanded to the same size as x (a tensor). But in this case, how the gradient of y is calculated through next_functions?

      x = torch.randn(4,4, requires_grad=True)
      y = torch.tensor(2., requires_grad=True)
      z = x * y
      l = z.sum()
      dl = torch.tensor(1.)
      back_sum = l.grad_fn
      dz = back_sum(dl)
      back_mul = back_sum.next_functions[0][0]
      dx, dy = back_mul(dz)
      back_x = back_mul.next_functions[0][0]
      back_x(dx)

      back_y = back_mul.next_functions[1][0]
      back_y(dy)
RuntimeError: output with shape [] doesn't match the broadcast shape [4, 4]
  12207     back_y = back_mul.next_functions[1][0]
> 12208     back_y(dy)

So the backward of y = x.expand(y_size) is dl_dx = dl_dy.sum_to_size(x.size()).

Just as broadcasting is inserting implicit expands, the autograd engine will insert implicit “expand backwards” in the form of sumtosize. If you do the backwards manually, you have to do the sum_to_size yourself.

Best regards

Thomas

2 Likes