So I have extended the book sample about how autograd work by extending as follows.
Original example was just 1 sample of test data with 1 feature so I kinda of extended to 5,3.
The example is extremely simple working of training model with manual code (forward layer) and backward pass is also manually done and its result compared against loss.backward as a proof how really backward() pass works. Great.
Now, instead of 1, I took two sample size and run through network and forward multiplication works but comparison of loss.backward() fails against manual computation. What did I do wrong here?
In third example below, I took two sample and still iterates one by one in the loop and comparison works but I am not sure this is really how training works. I’d rather 2nd example code below to work that forward pass will swoop through 2 sample sizes simultaneously if not the backward computation comparison does not fail.
EXAMPLE 1 (1 sample size per loop)
cat -n nn-manual-1.py
1 import torch
2 import code
3 cuda = torch.device('cuda')
4
5 # Create weight and bias values.
6
7 CONFIG_ENABLE_TEST=1
8 FEATURE_SIZE=5
9 SAMPLE_SIZE=3
10 w=torch.rand(FEATURE_SIZE, requires_grad=True, device='cuda')
11 b=torch.rand(FEATURE_SIZE, requires_grad=True, device='cuda')
12
13 torch.manual_seed(1)
14
15 # Create input(x), output (y, expected).
16 # input(x) used for forward pass: z=w*x+b, z will be computed y rather than expected y. diff=(z-y)
17
18 x_data=torch.rand([SAMPLE_SIZE, FEATURE_SIZE], device='cuda')
19 y_data=torch.rand(FEATURE_SIZE, device='cuda')
20 x=x_data[0:1]
21 y=y_data
22
23 for i in range(0, 2):
24 print("-------- ", i, " ---------")
25 z=torch.add(torch.mul(w, x), b)
26 loss = (y-z).pow(2).sum()
27
28 loss.backward()
29 print("loss: ", loss)
30 print("w: ", w, type(w))
31 print("b: ", b, type(b))
32 print('dL/dw : ', w.grad, type(w.grad))
33 print('dL/db : ', b.grad, type(b.grad))
34
35 # verifying output of loss.backward...
36
37 print("verifying output of loss.backward...(compare with DL/DW)")
38
39
40 # test1=DL/Dw = DL/DZ * DZ/DW
41 # 1. DL/DZ=D/DZ (y-z)**2 = D/DZ y**2-2yz+z**2 = -2y + 2z = 2(z-y)
42 # 2. DZ/DW = D/DW w * x + b = x.
43 # 3. DL/DW = DL/DZ * DZ/DW = 2(z-y)x = 2x(z-y) = 2x(w*x+B)-y
44
45 # test2=DL/db = DL/DZ * DZ/Db
46 # 1a. same as 1.
47 # 2a.DW/DB = d/db w * x + b = 1
48 # 3a. DL/Db = DL/DZ * DZ/Db = 2(z-y) * 1 = 2(z-y) = 2 (w * x + b) -y
49
50 test1=2 * x * ((w*x+b)-y)
51 print("dL/dw : ", w.grad)
52 print("t1 : ", test1)
53
54 test2=2 * ((w*x + b) - y)
55 print("dL/db : ", b.grad)
56 print("t2 : ", test2)
57
58 # update weights
59
60 w1 = w + w.grad
61 b1 = b + b.grad
62 w=w1.detach()
63 w.requires_grad=True
64 b=b1.detach()
65 b.requires_grad=True
66
67 print("new updated w1/b1: ")
68 print("w: ", w, type(w))
69 print("b: ", b, type(b))
70
[root@ixt-hq-180 nn-manual]# python3 nn-manual-1.py
-------- 0 ---------
loss: tensor(0.8684, device='cuda:0', grad_fn=<SumBackward0>)
w: tensor([0.2686, 0.3329, 0.0467, 0.8056, 0.8514], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([0.2696, 0.3146, 0.6340, 0.8193, 0.2254], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
dL/dw : tensor([-0.2908, 0.0051, -0.0749, 0.9804, -0.0844], device='cuda:0') <class 'torch.Tensor'>
dL/db : tensor([-0.3266, 0.1840, -0.0829, 1.8201, -0.1154], device='cuda:0') <class 'torch.Tensor'>
verifying output of loss.backward...(compare with DL/DW)
dL/dw : tensor([-0.2908, 0.0051, -0.0749, 0.9804, -0.0844], device='cuda:0')
t1 : tensor([[-0.2908, 0.0051, -0.0749, 0.9804, -0.0844]], device='cuda:0',
grad_fn=<MulBackward0>)
dL/db : tensor([-0.3266, 0.1840, -0.0829, 1.8201, -0.1154], device='cuda:0')
t2 : tensor([[-0.3266, 0.1840, -0.0829, 1.8201, -0.1154]], device='cuda:0',
grad_fn=<MulBackward0>)
new updated w1/b1:
w: tensor([-0.0222, 0.3380, -0.0282, 1.7859, 0.7670], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-0.0570, 0.4986, 0.5511, 2.6394, 0.1100], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
-------- 1 ---------
loss: tensor(11.3448, device='cuda:0', grad_fn=<SumBackward0>)
w: tensor([-0.0222, 0.3380, -0.0282, 1.7859, 0.7670], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-0.0570, 0.4986, 0.5511, 2.6394, 0.1100], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
dL/dw : tensor([-1.3333, 0.0152, -0.3468, 3.5099, -0.3434], device='cuda:0') <class 'torch.Tensor'>
dL/db : tensor([-1.4976, 0.5523, -0.3840, 6.5164, -0.4697], device='cuda:0') <class 'torch.Tensor'>
verifying output of loss.backward...(compare with DL/DW)
dL/dw : tensor([-1.3333, 0.0152, -0.3468, 3.5099, -0.3434], device='cuda:0')
t1 : tensor([[-1.3333, 0.0152, -0.3468, 3.5099, -0.3434]], device='cuda:0',
grad_fn=<MulBackward0>)
dL/db : tensor([-1.4976, 0.5523, -0.3840, 6.5164, -0.4697], device='cuda:0')
t2 : tensor([[-1.4976, 0.5523, -0.3840, 6.5164, -0.4697]], device='cuda:0',
grad_fn=<MulBackward0>)
new updated w1/b1:
w: tensor([-1.3555, 0.3531, -0.3749, 5.2959, 0.4235], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-1.5547, 1.0509, 0.1671, 9.1558, -0.3597], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
EXAMPLE 2: two sample size per loop (Check following lines for difference from example 1 above)
delta:
x=x_data[0:2]
output:
dL/dw : tensor([-1.3059, 0.1650, -0.0976, 1.7289, -0.1745], device='cuda:0')
t1 : tensor([[-0.6494, -0.0013, -0.0579, 0.7400, -0.1629],
cat nn-manual-1.py ; python3 nn-manual-1.py
import torch
import code
cuda = torch.device('cuda')
# Create weight and bias values.
CONFIG_ENABLE_TEST=1
FEATURE_SIZE=5
SAMPLE_SIZE=3
w=torch.rand(FEATURE_SIZE, requires_grad=True, device='cuda')
b=torch.rand(FEATURE_SIZE, requires_grad=True, device='cuda')
torch.manual_seed(1)
# Create input(x), output (y, expected).
# input(x) used for forward pass: z=w*x+b, z will be computed y rather than expected y. diff=(z-y)
x_data=torch.rand([SAMPLE_SIZE, FEATURE_SIZE], device='cuda')
y_data=torch.rand(FEATURE_SIZE, device='cuda')
x=x_data[0:2]
y=y_data
for i in range(0, 2):
print("-------- ", i, " ---------")
z=torch.add(torch.mul(w, x), b)
loss = (y-z).pow(2).sum()
loss.backward()
print("loss: ", loss)
print("w: ", w, type(w))
print("b: ", b, type(b))
print('dL/dw : ', w.grad, type(w.grad))
print('dL/db : ', b.grad, type(b.grad))
# verifying output of loss.backward...
print("verifying output of loss.backward...(compare with DL/DW)")
# test1=DL/Dw = DL/DZ * DZ/DW
# 1. DL/DZ=D/DZ (y-z)**2 = D/DZ y**2-2yz+z**2 = -2y + 2z = 2(z-y)
# 2. DZ/DW = D/DW w * x + b = x.
# 3. DL/DW = DL/DZ * DZ/DW = 2(z-y)x = 2x(z-y) = 2x(w*x+B)-y
# test2=DL/db = DL/DZ * DZ/Db
# 1a. same as 1.
# 2a.DW/DB = d/db w * x + b = 1
# 3a. DL/Db = DL/DZ * DZ/Db = 2(z-y) * 1 = 2(z-y) = 2 (w * x + b) -y
test1=2 * x * ((w*x+b)-y)
print("dL/dw : ", w.grad)
print("t1 : ", test1)
test2=2 * ((w*x + b) - y)
print("dL/db : ", b.grad)
print("t2 : ", test2)
# update weights
w1 = w + w.grad
b1 = b + b.grad
w=w1.detach()
w.requires_grad=True
b=b1.detach()
b.requires_grad=True
print("new updated w1/b1: ")
print("w: ", w, type(w))
print("b: ", b, type(b))
-------- 0 ---------
loss: tensor(1.8291, device='cuda:0', grad_fn=<SumBackward0>)
w: tensor([0.1335, 0.8721, 0.6207, 0.9881, 0.2463], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([0.1884, 0.1846, 0.1250, 0.4980, 0.6141], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
dL/dw : tensor([-1.3059, 0.1650, -0.0976, 1.7289, -0.1745], device='cuda:0') <class 'torch.Tensor'>
dL/db : tensor([-1.4552, 0.4470, -1.2063, 2.9352, -0.7959], device='cuda:0') <class 'torch.Tensor'>
verifying output of loss.backward...(compare with DL/DW)
**dL/dw : tensor([-1.3059, 0.1650, -0.0976, 1.7289, -0.1745], device='cuda:0')**
**t1 : tensor([[-0.6494, -0.0013, -0.0579, 0.7400, -0.1629],**
** [-0.6565, 0.1663, -0.0397, 0.9889, -0.0115]], device='cuda:0',**
** grad_fn=<MulBackward0>)**
**dL/db : tensor([-1.4552, 0.4470, -1.2063, 2.9352, -0.7959], device='cuda:0')**
**t2 : tensor([[-0.7295, -0.0464, -0.0641, 1.3739, -0.2228],**
** [-0.7257, 0.4934, -1.1422, 1.5612, -0.5731]], device='cuda:0',**
** grad_fn=<MulBackward0>)**
new updated w1/b1:
w: tensor([-1.1725, 1.0371, 0.5231, 2.7170, 0.0718], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-1.2667, 0.6316, -1.0813, 3.4331, -0.1818], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
-------- 1 ---------
loss: tensor(69.6964, device='cuda:0', grad_fn=<SumBackward0>)
w: tensor([-1.1725, 1.0371, 0.5231, 2.7170, 0.0718], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-1.2667, 0.6316, -1.0813, 3.4331, -0.1818], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
dL/dw : tensor([-10.7375, 0.5285, -2.5198, 10.9996, -1.5573], device='cuda:0') <class 'torch.Tensor'>
dL/db : tensor([-11.9639, 2.3553, -6.2146, 18.7286, -4.2419], device='cuda:0') <class 'torch.Tensor'>
verifying output of loss.backward...(compare with DL/DW)
dL/dw : tensor([-10.7375, 0.5285, -2.5198, 10.9996, -1.5573], device='cuda:0')
**t1 : tensor([[-5.3105, 0.0235, -2.3961, 4.9052, -1.5135],**
** [-5.4271, 0.5050, -0.1237, 6.0945, -0.0437]], device='cuda:0',**
** grad_fn=<MulBackward0>)**
**dL/db : tensor([-11.9639, 2.3553, -6.2146, 18.7286, -4.2419], device='cuda:0')**
**t2 : tensor([[-5.9651, 0.8567, -2.6531, 9.1068, -2.0699],**
** [-5.9989, 1.4986, -3.5615, 9.6218, -2.1720]], device='cuda:0',**
** grad_fn=<MulBackward0>)**
new updated w1/b1:
w: tensor([-11.9100, 1.5656, -1.9967, 13.7167, -1.4854], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
b: tensor([-13.2307, 2.9869, -7.2959, 22.1617, -4.4237], device='cuda:0',
requires_grad=True) <class 'torch.Tensor'>
[root@ixt-hq-180 nn-manual]#