# In this grad example, interesting how loss affects the value of w

In the grad example below, x, y, w and b initialized to tensor for input, output, weight and bias respectively.
and then in line 21, z is computed for “forward” pass to be compared against y later

and loss is computed by comparing z against y:
loss = (y-z).pow(2).sum()

This backward seems to compute the gradient. WHere it is being updated, it seems it is updating w.grad and b.grad.
loss.backward()

Interesting thing is how computing loss affects the w. How would loss know to update the w.grad and b.grad in this example?

Here is how I knew by stepping through line by line

None
None
(Pdb) l
19 for i in range(0, 5):
20 print("-------- “, i, " ---------”)
22 loss = (y-z).pow(2).sum()
23
24 → loss.backward()
25 print("loss: ", loss)
26 print("w: ", w, type(w))
27 print("b: ", b, type(b))
(Pdb) n
→ print("loss: ", loss)
tensor([ 0.6732, -0.0102, -0.3356, 0.1399, -0.0511], device=‘cuda:0’)
tensor([ 0.7562, -0.3724, -0.3716, 0.2598, -0.0698], device=‘cuda:0’)
(Pdb)

CODE EXAMPLE:

1 import torch
2 import code
3 cuda = torch.device(‘cuda’)
4
5 # Create weight and bias values.
6
7 TENSOR_SIZE=5
10
11 torch.manual_seed(1)
12
13 # Create input(x), output (y, expected).
14 # input(x) used for forward pass: z=wx+b, z will be computed y rather than expected y. diff=(z-y)
15
16 x=torch.rand(TENSOR_SIZE, device=‘cuda’)
17 y=torch.rand(TENSOR_SIZE, device=‘cuda’)
18
19 for i in range(0, 5):
20 print("-------- “, i, " ---------”)
22 loss = (y-z).pow(2).sum()
23
24 loss.backward()
25 print("loss: ", loss)
26 print("w: ", w, type(w))
27 print("b: ", b, type(b))
30
31 # verifying output of loss.backward…
32
33 print(“verifying output of loss.backward…(compare with DL/DW)”)
34 test1=2 * x * ((w
x+b)-y)
36 print("t : ", test1[:5])
37
38 # update weights
39
40 w1 = w + w.grad
41 b1 = b + b.grad
42 w=w1.detach()
44 b=b1.detach()
46
47 print("new updated w1/b1: ")
48 print("w: ", w, type(w))
49 print("b: ", b, type(b))
50

Hi,
W and b are leaf nodes in the computational graph of loss.

As you call backward on the loss, gradients flow through the backward graph and the grad attribute of the leaf nodes is populated.

yes, very nice answer and direction. Any good documentation on computational graph?
Just quickly searched and found following:

The official docs suffice. Feel free to post here if you have any doubts.