In this grad example, interesting how loss affects the value of w

In the grad example below, x, y, w and b initialized to tensor for input, output, weight and bias respectively.
and then in line 21, z is computed for “forward” pass to be compared against y later

z=torch.add(torch.mul(w, x), b)

and loss is computed by comparing z against y:
loss = (y-z).pow(2).sum()

This backward seems to compute the gradient. WHere it is being updated, it seems it is updating w.grad and b.grad.
loss.backward()

Interesting thing is how computing loss affects the w. How would loss know to update the w.grad and b.grad in this example?

Here is how I knew by stepping through line by line

/gg/git/codelab/gpu/ml/pt/ml-with-pt-and-sk/ch13-pytorch-mechanics/p415-vector-grad-basic.py(24)->
(Pdb) print(w.grad)
None
(Pdb) print(b.grad)
None
(Pdb) l
19 for i in range(0, 5):
20 print("-------- “, i, " ---------”)
21 z=torch.add(torch.mul(w, x), b)
22 loss = (y-z).pow(2).sum()
23
24 → loss.backward()
25 print("loss: ", loss)
26 print("w: ", w, type(w))
27 print("b: ", b, type(b))
28 print('dL/dw : ', w.grad, type(w.grad))
29 print('dL/db : ', b.grad, type(b.grad))
(Pdb) n
/gg/git/codelab/gpu/ml/pt/ml-with-pt-and-sk/ch13-pytorch-mechanics/p415-vector-grad-basic.py(25)()
→ print("loss: ", loss)
(Pdb) print(w.grad)
tensor([ 0.6732, -0.0102, -0.3356, 0.1399, -0.0511], device=‘cuda:0’)
(Pdb) print(b.grad)
tensor([ 0.7562, -0.3724, -0.3716, 0.2598, -0.0698], device=‘cuda:0’)
(Pdb)

CODE EXAMPLE:

cat -n p415-vector-grad-basic.py
1 import torch
2 import code
3 cuda = torch.device(‘cuda’)
4
5 # Create weight and bias values.
6
7 TENSOR_SIZE=5
8 w=torch.rand(TENSOR_SIZE, requires_grad=True, device=‘cuda’)
9 b=torch.rand(TENSOR_SIZE, requires_grad=True, device=‘cuda’)
10
11 torch.manual_seed(1)
12
13 # Create input(x), output (y, expected).
14 # input(x) used for forward pass: z=wx+b, z will be computed y rather than expected y. diff=(z-y)
15
16 x=torch.rand(TENSOR_SIZE, device=‘cuda’)
17 y=torch.rand(TENSOR_SIZE, device=‘cuda’)
18
19 for i in range(0, 5):
20 print("-------- “, i, " ---------”)
21 z=torch.add(torch.mul(w, x), b)
22 loss = (y-z).pow(2).sum()
23
24 loss.backward()
25 print("loss: ", loss)
26 print("w: ", w, type(w))
27 print("b: ", b, type(b))
28 print('dL/dw : ', w.grad, type(w.grad))
29 print('dL/db : ', b.grad, type(b.grad))
30
31 # verifying output of loss.backward…
32
33 print(“verifying output of loss.backward…(compare with DL/DW)”)
34 test1=2 * x * ((w
x+b)-y)
35 print("dL/dw : ", w.grad)
36 print("t : ", test1[:5])
37
38 # update weights
39
40 w1 = w + w.grad
41 b1 = b + b.grad
42 w=w1.detach()
43 w.requires_grad=True
44 b=b1.detach()
45 b.requires_grad=True
46
47 print("new updated w1/b1: ")
48 print("w: ", w, type(w))
49 print("b: ", b, type(b))
50

Hi,
W and b are leaf nodes in the computational graph of loss.

As you call backward on the loss, gradients flow through the backward graph and the grad attribute of the leaf nodes is populated.

Does that answer your question?

yes, very nice answer and direction. Any good documentation on computational graph?
Just quickly searched and found following:

The official docs suffice. Feel free to post here if you have any doubts.