HI,
I got confused with the concept torch.no_grad(). based on the Pytorch tutorials " You can also stop autograd from tracking history on Tensors with `.requires_grad=True by wrapping the code block in
with torch.no_grad():". now look at this code:
x = torch.tensor([2., 2], requires_grad=True)
y = x**2 + x
z = y.sum()
z.backward()
print(x.grad)
with torch.no_grad():
x = x+1z.backward()
print(x.grad)
first print works correctly, however, the second print not and it gives “RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.” which make sense based on the Pytorch website. Now my question is, in the train loop of Deep Neural Networks, there is a similar situation because in each iteration we the update formula for weights are not included in the history of weights but in the next iteration, the auto grad needs the history of weights for calculating the gradients. then pytorch behave differently in these two scenarios?
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)learning_rate = 1e-6
for t in range(500):y_pred = x.mm(w1).clamp(min=0).mm(w2) loss = (y_pred - y).pow(2).sum() print(t, loss.item()) loss.backward() with torch.no_grad(): w1 -= learning_rate * w1.grad w2 -= learning_rate * w2.grad w1.grad.zero_() w2.grad.zero_()