With torch.no_grad():

HI,
I got confused with the concept torch.no_grad(). based on the Pytorch tutorials " You can also stop autograd from tracking history on Tensors with `.requires_grad=True by wrapping the code block inwith torch.no_grad():". now look at this code:

x = torch.tensor([2., 2], requires_grad=True)
y = x**2 + x
z = y.sum()
z.backward()
print(x.grad)
with torch.no_grad():
x = x+1

z.backward()
print(x.grad)

first print works correctly, however, the second print not and it gives “RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.” which make sense based on the Pytorch website. Now my question is, in the train loop of Deep Neural Networks, there is a similar situation because in each iteration we the update formula for weights are not included in the history of weights but in the next iteration, the auto grad needs the history of weights for calculating the gradients. then pytorch behave differently in these two scenarios?

w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):

y_pred = x.mm(w1).clamp(min=0).mm(w2)
loss = (y_pred - y).pow(2).sum()
print(t, loss.item())
loss.backward()

with torch.no_grad():
    w1 -= learning_rate * w1.grad
    w2 -= learning_rate * w2.grad

    w1.grad.zero_()
    w2.grad.zero_()

Regarding your first code part, you need to do the forward pass before calling z.backward() the second time. So the following code:

x = torch.tensor([2., 2], requires_grad=True)
y = x**2 + x
z = y.sum()
z.backward()
print(x.grad)

with torch.no_grad():
   x = x+1

y = x**2 + x
z = y.sum()
z.backward()
print(x.grad)

However, since you have placed the new x in the with torch.no_grad() block, therefore no gradiaents can be computed, and it gives the following Error (which is expected):

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

the second piece of code is related to neural networks train phase and it works. while if we follow the same reasoning that you explained the second code shouldn’t work too. because it is in a for loop and at the end of loop the gradient history for both w1, and w2 are cleared while in next iteration they has been used in the loss.backward().