In the grad example below, x, y, w and b initialized to tensor for input, output, weight and bias respectively.

and then in line 21, z is computed for “forward” pass to be compared against y later

z=torch.add(torch.mul(w, x), b)

and loss is computed by comparing z against y:

loss = (y-z).pow(2).sum()

This backward seems to compute the gradient. WHere it is being updated, it seems it is updating w.grad and b.grad.

loss.backward()

Interesting thing is how computing loss affects the w. How would loss know to update the w.grad and b.grad in this example?

Here is how I knew by stepping through line by line

/gg/git/codelab/gpu/ml/pt/ml-with-pt-and-sk/ch13-pytorch-mechanics/p415-vector-grad-basic.py(24)->

(Pdb) print(w.grad)

None

(Pdb) print(b.grad)

None

(Pdb) l

19 for i in range(0, 5):

20 print("-------- “, i, " ---------”)

21 z=torch.add(torch.mul(w, x), b)

22 loss = (y-z).pow(2).sum()

23

24 → loss.backward()

25 print("loss: ", loss)

26 print("w: ", w, type(w))

27 print("b: ", b, type(b))

28 print('dL/dw : ', w.grad, type(w.grad))

29 print('dL/db : ', b.grad, type(b.grad))

(Pdb) n

/gg/git/codelab/gpu/ml/pt/ml-with-pt-and-sk/ch13-pytorch-mechanics/p415-vector-grad-basic.py(25)()

→ print("loss: ", loss)

(Pdb) print(w.grad)

tensor([ 0.6732, -0.0102, -0.3356, 0.1399, -0.0511], device=‘cuda:0’)

(Pdb) print(b.grad)

tensor([ 0.7562, -0.3724, -0.3716, 0.2598, -0.0698], device=‘cuda:0’)

(Pdb)

CODE EXAMPLE:

cat -n p415-vector-grad-basic.py

1 import torch

2 import code

3 cuda = torch.device(‘cuda’)

4

5 # Create weight and bias values.

6

7 TENSOR_SIZE=5

8 w=torch.rand(TENSOR_SIZE, requires_grad=True, device=‘cuda’)

9 b=torch.rand(TENSOR_SIZE, requires_grad=True, device=‘cuda’)

10

11 torch.manual_seed(1)

12

13 # Create input(x), output (y, expected).

14 # input(x) used for forward pass: z=w*x+b, z will be computed y rather than expected y. diff=(z-y)
15
16 x=torch.rand(TENSOR_SIZE, device=‘cuda’)
17 y=torch.rand(TENSOR_SIZE, device=‘cuda’)
18
19 for i in range(0, 5):
20 print("-------- “, i, " ---------”)
21 z=torch.add(torch.mul(w, x), b)
22 loss = (y-z).pow(2).sum()
23
24 loss.backward()
25 print("loss: ", loss)
26 print("w: ", w, type(w))
27 print("b: ", b, type(b))
28 print('dL/dw : ', w.grad, type(w.grad))
29 print('dL/db : ', b.grad, type(b.grad))
30
31 # verifying output of loss.backward…
32
33 print(“verifying output of loss.backward…(compare with DL/DW)”)
34 test1=2 * x * ((w*x+b)-y)

35 print("dL/dw : ", w.grad)

36 print("t : ", test1[:5])

37

38 # update weights

39

40 w1 = w + w.grad

41 b1 = b + b.grad

42 w=w1.detach()

43 w.requires_grad=True

44 b=b1.detach()

45 b.requires_grad=True

46

47 print("new updated w1/b1: ")

48 print("w: ", w, type(w))

49 print("b: ", b, type(b))

50