This is me, again… Something weird is happening. When I register the variables as we discussed above in this class (I pasted it in a gist because it is long), not all variables are being seen by autograd. There should be a total of 8 tensors but at the end of optimization only 6 of them are shown (relevant part of the output below):
outputs
tensor([-14.5772266388, -14.5772266388])
targets
tensor([-14.5868730545, -14.5640010834])
No diff in intercept_Cu
No diff in slope_Cu
Diff in linears.Cu.0.weight
No diff in linears.Cu.0.bias
Diff in linears.Cu.2.weight
No diff in linears.Cu.2.bias
Diff in linears.Cu.4.weight
No diff in linears.Cu.4.bias
Optimized parameters for Cu symbol
Index 0
Parameter containing:
tensor([[ 1.5998219169e-05, -1.1084647089e-11, 3.4887983702e-07,
-5.9022102505e-05, 1.5358797100e-05, -2.2421262713e-07,
3.9578364522e-05, -2.5841705792e-05],
[-1.2979450847e-11, 2.4929600051e-11, -4.3761640145e-11,
8.7308825414e-07, 6.8003464548e-07, -6.9001464453e-07,
-3.0529092328e-05, -1.9285680537e-06],
[-3.4112128677e-09, -2.0672181415e-12, 1.0248225879e-12,
1.3090937500e-05, -1.9991681199e-08, -1.2244654499e-05,
1.1959917501e-09, -1.1793726173e-07],
[ 9.3959987360e-12, -2.8132822081e-06, -7.1578106144e-06,
-1.5608311514e-06, 7.4273208156e-05, -6.5615221589e-13,
1.0243820725e-04, 2.6734230119e-07],
[-2.8905316867e-05, 1.7972409978e-06, 2.8471620681e-05,
1.1441625247e-06, -4.3263348743e-06, 9.2861837402e-06,
-7.3636897469e-08, -6.2427188823e-06],
[ 1.8716022510e-08, -4.3462468966e-06, -7.1537678559e-11,
4.4766447493e-13, -4.2634189867e-07, 6.2688843006e-10,
-1.5413985643e-09, -1.9352362415e-06],
[-4.0789027480e-06, 1.7624552484e-08, -5.8772937336e-05,
1.3928577259e-12, 1.4477242303e-06, -6.5660731252e-07,
1.3057894830e-04, 1.0623334674e-06],
[ 2.8627397342e-07, 7.6879496191e-07, -1.5201392500e-07,
9.4639290182e-08, 1.7211885250e-09, -3.1544458712e-10,
-3.1436915742e-04, -9.5523216004e-09],
[ 5.4327131238e-07, 5.3367260989e-05, 3.0272097329e-11,
-2.5873794129e-06, -2.5613280741e-07, 4.1264866013e-05,
1.3438809527e-12, -5.6481166411e-09],
[-6.4899657445e-05, -4.3667625960e-08, -6.4955729684e-10,
7.9043999790e-08, -7.7281238191e-06, 1.7655082047e-05,
-1.6245309098e-07, -1.7478591019e-08]], requires_grad=True)
Gradient tensor(0.0126342149)
Index 1
Parameter containing:
tensor([ 0.0846629143, 0.2052433789, 0.1129320264, 0.1384415329,
0.2349925339, -0.1073408127, 0.2195934355, 0.3364700377,
0.1929847300, -0.0893238783], requires_grad=True)
No gradient?
Index 2
Parameter containing:
tensor([[-1.4006408492e-05, -1.3260194009e-06, 1.4346720434e-07,
-6.5448512032e-07, 2.9784255275e-06, 4.5995878547e-13,
6.7223256337e-05, 6.4453017576e-12, 1.0301571401e-10,
-1.2009696349e-08],
[-2.2814828071e-07, -5.8791869151e-08, -3.9165245835e-04,
-2.5221936539e-06, 1.1180619595e-06, -2.6514657293e-05,
-1.4766897038e-07, 2.7023989242e-04, -2.9795790401e-12,
3.4368467823e-06],
[ 3.6120570712e-06, -3.7223298568e-04, 7.1171717408e-09,
-4.0368172449e-06, -1.1812019807e-07, -9.0479334176e-06,
-9.7775303479e-12, 3.3027842505e-07, -2.2225761143e-07,
1.7060537516e-07],
[ 4.7848516260e-05, 1.4109857602e-06, -4.7986867813e-09,
-1.1886934145e-11, -1.5743089534e-06, -1.9210867777e-06,
2.5946489401e-10, 7.1065740485e-05, -7.2540847214e-06,
-2.9720404740e-13],
[ 7.8338234744e-07, 2.9897366403e-05, 1.0493286936e-05,
-1.2905216806e-07, -5.0532015905e-08, -1.4369081327e-05,
5.9140187659e-05, 1.8394788640e-05, 2.8736901004e-04,
-7.9514339557e-11],
[-3.5491411109e-04, 3.9472433855e-06, -3.6779524635e-06,
1.3279050108e-05, 1.0775630388e-09, 2.0076269536e-09,
2.2207383154e-05, 1.0671607924e-05, 3.5179223801e-07,
8.3256582002e-06],
[-4.0831773518e-09, 3.4044984204e-05, 3.9824635678e-07,
-5.4254252291e-07, -8.2707781596e-12, 7.9960360555e-10,
1.6246242751e-07, -1.5748057303e-09, -4.6191617002e-05,
1.4769234986e-04],
[ 6.0335892158e-06, 4.0175755203e-06, 2.3420781872e-05,
-1.4100555745e-07, 4.3824256863e-06, -1.9676244847e-05,
-4.2883926653e-05, 2.6943742341e-05, 1.5044579982e-07,
3.4529236359e-08],
[-2.4134715204e-05, 3.6303499655e-05, -1.0801615247e-07,
8.3609793364e-06, 3.0849619179e-06, -8.6793288574e-06,
2.4900288554e-04, 8.5335452355e-14, -3.4220584699e-11,
-4.0262357288e-06],
[-3.2995540096e-06, -9.5245795251e-08, 2.4340472748e-08,
-3.7661432133e-13, -4.4606429661e-09, -7.5562275015e-06,
-6.9999718107e-05, 1.4586039470e-04, 1.0552175809e-06,
-6.1385714220e-12]], requires_grad=True)
Gradient tensor(-0.0002918996)
Index 3
Parameter containing:
tensor([-0.0368886292, -0.1048975587, -0.2438423038, -0.2089971900,
0.2615807354, 0.0241439044, -0.1016014665, 0.2302859128,
-0.2738550305, -0.2952967882], requires_grad=True)
No gradient?
Index 4
Parameter containing:
tensor([[-1.6471599520e-04, 5.0920876674e-05, 1.6964193492e-05,
-7.2204138633e-06, -7.4410144713e-11, 1.3845928848e-09,
2.6772568162e-07, 4.4445322422e-11, 3.0647162930e-05,
-4.6163746447e-05]], requires_grad=True)
Gradient tensor(-0.0134047084)
Index 5
Parameter containing:
tensor([-0.1564691514], requires_grad=True)
No gradient?
I understand biases are not counted, but only the weights of the layers are retained. Additionally, the loss seems to be decreasing in error with each epoch but the outputs of the model remain the same.
Do you see any problem in the class I have built? what would you recommend to check? I would really appreaciate any suggestions. I am lost here.