In the grad_fn ，I find a next_functions ,But I don't understand the meaning of the attribute

wuyongyu · September 4, 2018, 7:32am

a=Variable(torch.randn(1),requires_grad=True)
p=a*a
p_tmp = p.expand_as(p)
grad_acc = p_tmp.grad_fn.next_functions[0][0]

I dont know the meaning of ‘next_functions’,and I couldn’t find the define of it.

tom · September 4, 2018, 8:07am

First, don’t use Variable anymore! Be modern!

Regarding your question, the next_functions will allow you to traverse the recorded calculation graph (“backward graph”).
The backward graph will end in AccumulateGrad nodes for the leaves (they have a .variable attribute pointing to the leaf tensor) - and yours does pretty quickly as you only have one operation. Let’s have a slightly more elaborate one:

a = torch.randn(1, requires_grad=True)
b = a*(a+2)
print (b.grad_fn.next_functions)
print (b.grad_fn.next_functions[1][0].next_functions)
print (b.grad_fn.next_functions[0][0].variable is a)

gives

((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (<AddBackward0 object at 0x7fbe7aa96748>, 0))
((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (None, 0))
True

So in ‘x*(x+2)’ you have one branch for x and one for x+2 and the latter has a x branch and an uninteresting 2 branch.
Except at the leaves, you cannot, in general access the variables of the calculation from the graph.

Best regards

Thomas

wuyongyu · September 5, 2018, 2:01am

@tom thanks very much ! I remember that and be modern.

saluto · September 14, 2018, 11:38am

Thanks for your explanation. Can you explain what the second element in each of these inner tuples stands for? They are all 0 and I cannot create an example causing a different value. Can they be e.g. 1?

tom · September 14, 2018, 6:14pm

The number is the input number to the next backward function, so can only be non-zero when a function has multiple differentiable outputs (there aren’t that many, but e.g. the RNN functions typically do).
A minimal example that doesn’t serve much purpose except showing you a 1 is:

a, b = torch.randn(2, requires_grad=True).unbind()
c = a+b
print(c.grad_fn.next_functions)

Unbind is not that well-known, it is the “opposite” of stack, splitting a tensor along one (the first by default) dimension into a list.

Best regards

Thomas

saluto · September 14, 2018, 8:37pm

Thanks, again. Now I understand.

saluto · September 15, 2018, 12:30pm

Sorry for bothering you again. Off topic, but probably not worth a new thread: Do you have an explanation for the following behaviour regarding unbind()?

a = torch.randn(2, requires_grad=True)
a0, a1 = a.unbind()
# a0, a1 = a  # but this works
a0.backward()

Causes this error:

RuntimeError: Expected a Tensor of type Variable but found an undefined Tensor at position #1 for iterable argument #0 'tensors'

Best regards,
Saluto

tom · September 15, 2018, 12:41pm

fixed in master:

Best regards

Thomas

saluto · September 15, 2018, 1:28pm

Really great, thanks!

guihao-liang · April 2, 2021, 5:28pm

Hi Tom,

what about the tuple (None, 0)?

it seems to be the backprop function for the constant 2.

tom · April 4, 2021, 3:50pm

Indeed. Internally, PyTorch loves to connect all relevant inputs to the graph, and if those don’t require gradients, you get None.

BruceDai003 · August 17, 2021, 6:32am

Thanks for the explanation, I’m actually running some test_autograd.py unit test. And there is typically a gradgradcheck for a test function. Some function failed gradgradcheck, although they passed gradcheck. So as I’m using VScode python debugger to debug this case. I really need to understand how some function, e.g. addcdiv do the backward pass. Now I see how we can use grad.fn.next_functions to recover a lot of information. And also from the derivatives.yaml file, the backward calculation formula for all the functions are already there. But my question is how can we mannully get the gradients for the inputs if we only have output, or output.grad_fn.
As you mentioned, for example, y=x*2, we can backtrace to x from AccummulateGrad object’s variable attribute, but dy/dx = 2, this is the gradient, where is this information stored?

tom · August 17, 2021, 7:11am

To the best of my knowledge, not all “backward-functions” are exposed. In particular those in FunctionsManual.cpp, the backward computations which typically are composed themselves of PyTorch functions, are not themselves exported.
(At least that was the state last I looked. Personally, I think that they should be available programmatically, but it didn’t catch on.)

Best regards

Thomas

BruceDai003 · August 17, 2021, 8:05am

Thanks, in the case of addcdiv, when I do gradgradcheck, that is to calculate the second order of derivatives. I just successfully manually used grad_fn(torch.ones(1, device=‘cuda:0’)) to get the grad to the inputs of this grad_fn. And by looking at the next_functions, and then pass the grad of the inputs from last step as the input to this next_functions’ input, I can manually get the gradients to the leafs eventually. However, for some of the scalars, I have to do a manual tensor.sum() to get the scalar gradients in the end. So I think the grad_fn class does contain the information including what shapes are accepted, what the constants to mul/div are.(e.g. in y = 2 * x, the 2 is a constant)
So my manual calculations using grad_fn does coninside with what should be. But on my machine, the gradgrad results for addcdiv is not all correct. So I’m stuck at how to debug this. In the addcdiv example, the backward-functions are all very simple, like SumBackward, MulBackward, NegBackward, DivBackward, AccumulateGrad. And I already got the graph of how they are chained together. But I don’t know where can I set a breakpoint in the c++/cuda source code in order to debug this. For example, if DivBackward for cuda tensor has some bug, where should I put a breakpoint for this? Much appreciated.

tom · August 18, 2021, 9:19am

Right, calling the grad_fn works these days.
So there are three parts:

part of the interface is generated at build-time in torch/csrc/autograd/generated . These include the code for the autograd Node classes like DivBackward.
the generated wrapping code ultimately calls the function specified in tools/autograd/derivatives.yaml, which you can find in the source code (either in FunctionsManual.cpp or in aten/src/ATen/native/native_functions.yaml and implemented in ATen).
the graph recording for the double backward is done by treating the PyTorch calls in the backward as regular PyTorch function calls and they again generate autograd nodes, so you are again in the situation of 1./2.

If you track down the functions to step 2 of the double backward, you could set breakpoints there.

Best regards

Thomas

By the way: Do you have a (smallish) reproducing example of the failing gradgradcheck? It is very likely that people would be keen to help debug it if they can reproduce the error.

amsword · September 12, 2021, 11:40pm

I have a question about addition operation between a scaler and a tensor. As follows, it crashes with shape mismatched error. It seems like y (a scaler) is implicitly expanded to the same size as x (a tensor). But in this case, how the gradient of y is calculated through next_functions?

      x = torch.randn(4,4, requires_grad=True)
      y = torch.tensor(2., requires_grad=True)
      z = x * y
      l = z.sum()
      dl = torch.tensor(1.)
      back_sum = l.grad_fn
      dz = back_sum(dl)
      back_mul = back_sum.next_functions[0][0]
      dx, dy = back_mul(dz)
      back_x = back_mul.next_functions[0][0]
      back_x(dx)

      back_y = back_mul.next_functions[1][0]
      back_y(dy)

RuntimeError: output with shape [] doesn't match the broadcast shape [4, 4]
  12207     back_y = back_mul.next_functions[1][0]
> 12208     back_y(dy)

tom · September 13, 2021, 6:31am

So the backward of y = x.expand(y_size) is dl_dx = dl_dy.sum_to_size(x.size()).

Just as broadcasting is inserting implicit expands, the autograd engine will insert implicit “expand backwards” in the form of sumtosize. If you do the backwards manually, you have to do the sum_to_size yourself.

Best regards

Thomas