Inconsistency in Tensor.backward

bala · August 1, 2019, 10:33am

Tensor.backward, as per documentation, frees the computation graph after the call to backward unless retain_graph is set to True. This is the exactly the behaviour I observed for the following example.

x = torch.ones(1., requires_grad=True)
y = torch.ones(1., requires_grad = True)
w = x+y
u=2*w
u.backward()
u.backward()

2nd backward call results error which matches with what docs say.

But for the following example, I don’t find any error.

x = torch.ones(1., requires_grad=True)
y = torch.ones(1., requires_grad = True)
w = x+y
w.backward()
w.backward()

Kindly clarify. This post is similar to https://discuss.pytorch.org/t/inconsistent-behavior-for-running-backward-twice-without-retain-graph-true/39917

ptrblck · August 1, 2019, 12:08pm

Similar to this question:

bala · August 2, 2019, 3:58am

Thanks for the response. I get it now. But I find another issue, with create_graph option of backward method. The docs say that setting it to True would build the graph of the derivative. But there is no clarity as to how it is built. I found some inconsitency in using this option as explained in the following example:

x = torch.tensor(3., requires_grad = True)
y = x**3
print(f’{y.grad_fn}‘)
y.backward(create_graph = True) # to facilitate computation of second order derivative of y
print(f"y’: {x.grad}")

x.grad.data = torch.zeros_like(x)
print(f’{x.grad.grad_fn}‘)
x.grad.backward(create_graph = True) # to facilitate computation of third order derivative of y
print(f’y": {x.grad}’)

x.grad.data = torch.zeros_like(x)
print(f’{x.grad.grad_fn}‘)
x.grad.backward()
print(f’y^3: {x.grad}’)

I get the results as follows:

<PowBackward0 object at 0x7fb0d3503e10>
y’: 27.0
<CloneBackward object at 0x7fb0d3503dd8>
y": 18.0
<AddBackward0 object at 0x7fb0d3503908>
y^3: 24.0

As can be seen, first and second derivatives (I have zeroed out the grad before computing second derivative) are fine but third derivative is wrong. I thought by CloneBackward being the grad_fn, the graph of the derivative is simply the copy of the original graph with the node associated with the expression y=x**2 replaced by the node associated with the expression x.grad. But I don’t understand how AddBackward0 is the grad_fn after second call to backward with create_graph = True.

Seeking clarity on this.