Why is this compuation graph failling?

samuel_beaussant · February 26, 2024, 6:57pm

I am currently playing with the autograd engine to understand its implementation better. I have made up a very simple computation graph as shown in the code below :

import torch
import torch.nn.functional as F



A = torch.arange(16, requires_grad = True, dtype=torch.float32)
B = torch.arange(32, requires_grad = True, dtype=torch.float32)
C = torch.arange(4, requires_grad = True, dtype=torch.float32)
D = torch.arange(16, requires_grad = True, dtype=torch.float32)
 
A_ = A.reshape(4,4)+1
B_ = B.reshape(2,4,4)
C_ = C.reshape(4,1)
D_ = D.reshape(4,4)

E = A_ + B_
F = E@C_
G = F * C_
T = G.swapdims(1,2)
R = T.sum()
print(R)

H = B_ + D_
I = H / A_
R_ = I.mean()
Z = R_ * G
Res = Z.sum()
print(Res)

G.retain_grad()
E.retain_grad()
F.retain_grad()
A_.retain_grad()
B_.retain_grad()
C_.retain_grad()

R.backward()
#H.retain_grad()
Res.backward()

print("== leaf tensor ==")
print(A.grad)
print(B.grad)
print(C.grad)
print(D.grad)

This code throws an error “RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.”

I believe this is due to the tensor G being reused in the second computation graph.
My first question is why ? Does autograd clear the entire computation graph or only the nodes used during backprop ? Since G is part of another graph that was not used during the first backward call, I expected the G tensor and all tensors used to compute it, to not be freed after the first backward call.

ptrblck · February 26, 2024, 7:09pm

Based on your screenshot it seems G was indeed used in the fist backward via:

T = G.swapdims(1, 2)
R = T.sum()
...
R.backward()

Afterwards you are calling Res.backward() which was created from G again via:

Z = R_ * G
Res = Z.sum()

PS: It’s always better to post code snippets by wrapping them into three backticks ```, as it would allow us to copy/paste the code to reproduce the issue and the search engine could index it.

samuel_beaussant · February 26, 2024, 7:50pm

Thank you, I edited my question appropriately. So autograd, by default, will free Tensor after backprop even if they are still required by another part of the computation graph that was not backprop yet ?

Also I have a follow up question : replacing G by C_ in the second computation as follow

Z = R_ * C_

will not raise a RunTimeError. C_ is not a leaf node (result of a reshape op) so by default it should freed after the first backward call doesn’t it ? Why is this still working ?

ptrblck · February 26, 2024, 8:01pm

Yes, since the default eager mode isn’t aware of future calls and dependencies. You would thus need to use retain_graph=True in the first backward call to allow the second one to reuse the same intermediate activations and to free them afterwards.

No, it won’t be freed since you’ve explicitly created a reference to it. You can still access and print C_ after the backward calls.

samuel_beaussant · February 26, 2024, 8:12pm

Thank you for the calrifications. Can you elaborate a bit more on how I’ve explicitely created a reference to C_ ? I don’t get the difference between how G is created and how C_ is created.

ptrblck · February 27, 2024, 3:28am

By creating the assignment:

C_ = C.reshape(4,1)

The previous issue is not caused by G being deleted, but its computation graph including the intermediate activations needed for the gradient computation.

samuel_beaussant · February 27, 2024, 8:10am

Ok i get it, thank you for clarifying the matter !