a=Variable(torch.randn(1),requires_grad=True)
p=a*a
p_tmp = p.expand_as(p)
grad_acc = p_tmp.grad_fn.next_functions[0][0]
I dont know the meaning of ‘next_functions’,and I couldn’t find the define of it.
a=Variable(torch.randn(1),requires_grad=True)
p=a*a
p_tmp = p.expand_as(p)
grad_acc = p_tmp.grad_fn.next_functions[0][0]
I dont know the meaning of ‘next_functions’,and I couldn’t find the define of it.
First, don’t use Variable
anymore! Be modern!
Regarding your question, the next_functions will allow you to traverse the recorded calculation graph (“backward graph”).
The backward graph will end in AccumulateGrad nodes for the leaves (they have a .variable
attribute pointing to the leaf tensor) - and yours does pretty quickly as you only have one operation. Let’s have a slightly more elaborate one:
a = torch.randn(1, requires_grad=True)
b = a*(a+2)
print (b.grad_fn.next_functions)
print (b.grad_fn.next_functions[1][0].next_functions)
print (b.grad_fn.next_functions[0][0].variable is a)
gives
((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (<AddBackward0 object at 0x7fbe7aa96748>, 0))
((<AccumulateGrad object at 0x7fbe7aa96780>, 0), (None, 0))
True
So in ‘x*(x+2)’ you have one branch for x
and one for x+2
and the latter has a x
branch and an uninteresting 2
branch.
Except at the leaves, you cannot, in general access the variables of the calculation from the graph.
Best regards
Thomas
@tom thanks very much ! I remember that and be modern.
Thanks for your explanation. Can you explain what the second element in each of these inner tuples stands for? They are all 0
and I cannot create an example causing a different value. Can they be e.g. 1
?
The number is the input number to the next backward function, so can only be non-zero when a function has multiple differentiable outputs (there aren’t that many, but e.g. the RNN functions typically do).
A minimal example that doesn’t serve much purpose except showing you a 1
is:
a, b = torch.randn(2, requires_grad=True).unbind()
c = a+b
print(c.grad_fn.next_functions)
Unbind is not that well-known, it is the “opposite” of stack, splitting a tensor along one (the first by default) dimension into a list.
Best regards
Thomas
Thanks, again. Now I understand.
Sorry for bothering you again. Off topic, but probably not worth a new thread: Do you have an explanation for the following behaviour regarding unbind()
?
a = torch.randn(2, requires_grad=True)
a0, a1 = a.unbind()
# a0, a1 = a # but this works
a0.backward()
Causes this error:
RuntimeError: Expected a Tensor of type Variable but found an undefined Tensor at position #1 for iterable argument #0 'tensors'
Best regards,
Saluto
fixed in master:
Best regards
Thomas
Really great, thanks!
Hi Tom,
what about the tuple (None, 0)
?
it seems to be the backprop function for the constant 2
.
Indeed. Internally, PyTorch loves to connect all relevant inputs to the graph, and if those don’t require gradients, you get None.
Thanks for the explanation, I’m actually running some test_autograd.py unit test. And there is typically a gradgradcheck for a test function. Some function failed gradgradcheck, although they passed gradcheck. So as I’m using VScode python debugger to debug this case. I really need to understand how some function, e.g. addcdiv do the backward pass. Now I see how we can use grad.fn.next_functions to recover a lot of information. And also from the derivatives.yaml file, the backward calculation formula for all the functions are already there. But my question is how can we mannully get the gradients for the inputs if we only have output, or output.grad_fn.
As you mentioned, for example, y=x*2
, we can backtrace to x from AccummulateGrad object’s variable attribute, but dy/dx = 2, this is the gradient, where is this information stored?
To the best of my knowledge, not all “backward-functions” are exposed. In particular those in FunctionsManual.cpp, the backward computations which typically are composed themselves of PyTorch functions, are not themselves exported.
(At least that was the state last I looked. Personally, I think that they should be available programmatically, but it didn’t catch on.)
Best regards
Thomas
Thanks, in the case of addcdiv
, when I do gradgradcheck, that is to calculate the second order of derivatives. I just successfully manually used grad_fn(torch.ones(1, device=‘cuda:0’)) to get the grad to the inputs of this grad_fn. And by looking at the next_functions, and then pass the grad of the inputs from last step as the input to this next_functions’ input, I can manually get the gradients to the leafs eventually. However, for some of the scalars, I have to do a manual tensor.sum()
to get the scalar gradients in the end. So I think the grad_fn class does contain the information including what shapes are accepted, what the constants to mul/div are.(e.g. in y = 2 * x, the 2 is a constant)
So my manual calculations using grad_fn does coninside with what should be. But on my machine, the gradgrad results for addcdiv
is not all correct. So I’m stuck at how to debug this. In the addcdiv
example, the backward-functions are all very simple, like SumBackward, MulBackward, NegBackward, DivBackward, AccumulateGrad. And I already got the graph of how they are chained together. But I don’t know where can I set a breakpoint in the c++/cuda source code in order to debug this. For example, if DivBackward for cuda tensor has some bug, where should I put a breakpoint for this? Much appreciated.
Right, calling the grad_fn works these days.
So there are three parts:
If you track down the functions to step 2 of the double backward, you could set breakpoints there.
Best regards
Thomas
By the way: Do you have a (smallish) reproducing example of the failing gradgradcheck? It is very likely that people would be keen to help debug it if they can reproduce the error.
I have a question about addition operation between a scaler and a tensor. As follows, it crashes with shape mismatched error. It seems like y (a scaler) is implicitly expanded to the same size as x (a tensor). But in this case, how the gradient of y is calculated through next_functions?
x = torch.randn(4,4, requires_grad=True)
y = torch.tensor(2., requires_grad=True)
z = x * y
l = z.sum()
dl = torch.tensor(1.)
back_sum = l.grad_fn
dz = back_sum(dl)
back_mul = back_sum.next_functions[0][0]
dx, dy = back_mul(dz)
back_x = back_mul.next_functions[0][0]
back_x(dx)
back_y = back_mul.next_functions[1][0]
back_y(dy)
RuntimeError: output with shape [] doesn't match the broadcast shape [4, 4]
12207 back_y = back_mul.next_functions[1][0]
> 12208 back_y(dy)
So the backward of y = x.expand(y_size)
is dl_dx = dl_dy.sum_to_size(x.size())
.
Just as broadcasting is inserting implicit expands, the autograd engine will insert implicit “expand backwards” in the form of sumtosize. If you do the backwards manually, you have to do the sum_to_size
yourself.
Best regards
Thomas