Hello,
I have a CNN model with a layer which though takes input from its previous layer and computes an output, this output is not used at all. Hence I expect that back propagation should not affects its weight.
... My Model init() part
self.alpha = alpha_model
def forward(self, x, h, hist):
alpha = self.alpha(hist)
#x = torch.pow(x,alpha)
#h = torch.pow(h,alpha)
....
As you can see alpha_model
which is some other CNN which this CNN calls. alpha_model
is doing some computation but I am no longer using its outputs. See the commented section. That is the only place where alpha_model
had some usage.
Why then if print the weights of last layer of alpha_model
, the weights are changing by 0.0001(which is the my learning rate)? Since it is not contributing to the output the gradients should be 0. Is this because I use Adam optimiser?
Here is a sample output. Here fc4
is the last layer of alpha_model
fc4 weights Parameter containing:
tensor([[ 0.1044, -0.0683, -0.1153, -0.0387, -0.0276, 0.1335, -0.0946, 0.0927,
0.0041, 0.1364, -0.0192, -0.0634, -0.0361, 0.0884, 0.1091, -0.0954,
0.1241, 0.1089, -0.0930, 0.0839, 0.0144, 0.0735, 0.0217, 0.0746,
-0.0384, -0.0422, -0.0879, 0.0786, 0.0737, -0.0474, 0.1309, -0.0705,
-0.0487, -0.1311, -0.0782, -0.0974, 0.0303, 0.0652, 0.0628, -0.0315,
-0.0909, -0.0865, -0.0575, -0.1176, 0.0899, -0.0818, 0.0181, -0.1335,
0.1153, 0.0833]], requires_grad=True)
--
fc4 grad tensor([[ 1.0000e-06, -1.0000e-06, -1.0000e-06, -1.0000e-06, -1.0000e-06,
1.0000e-06, -1.0000e-06, 1.0000e-06, 1.0000e-06, 1.0000e-06,
-1.0000e-06, -1.0000e-06, -1.0000e-06, 1.0000e-06, 1.0000e-06,
-1.0000e-06, 1.0000e-06, 1.0000e-06, -1.0000e-06, 1.0000e-06,
1.0000e-06, 1.0000e-06, 1.0000e-06, 1.0000e-06, -1.0000e-06,
-1.0000e-06, -1.0000e-06, 1.0000e-06, 1.0000e-06, -1.0000e-06,
1.0000e-06, -1.0000e-06, -1.0000e-06, -1.0000e-06, -1.0000e-06,
-1.0000e-06, 1.0000e-06, 1.0000e-06, 1.0000e-06, -1.0000e-06,
-1.0000e-06, -1.0000e-06, -1.0000e-06, -1.0000e-06, 1.0000e-06,
-1.0000e-06, 1.0000e-06, -1.0000e-06, 1.0000e-06, 1.0000e-06]])
---------------------
fc4 weights Parameter containing:
tensor([[ 0.1043, -0.0682, -0.1152, -0.0386, -0.0275, 0.1334, -0.0945, 0.0926,
0.0040, 0.1363, -0.0191, -0.0633, -0.0360, 0.0883, 0.1090, -0.0953,
0.1240, 0.1088, -0.0929, 0.0838, 0.0143, 0.0734, 0.0216, 0.0745,
-0.0383, -0.0421, -0.0878, 0.0785, 0.0736, -0.0474, 0.1308, -0.0704,
-0.0486, -0.1310, -0.0781, -0.0973, 0.0302, 0.0652, 0.0627, -0.0314,
-0.0908, -0.0864, -0.0574, -0.1175, 0.0898, -0.0817, 0.0180, -0.1334,
0.1152, 0.0832]], requires_grad=True)
--
fc4 grad tensor([[ 2.0000e-06, -2.0000e-06, -2.0000e-06, -2.0000e-06, -2.0000e-06,
2.0000e-06, -2.0000e-06, 2.0000e-06, 2.0000e-06, 2.0000e-06,
-2.0000e-06, -2.0000e-06, -2.0000e-06, 2.0000e-06, 2.0000e-06,
-2.0000e-06, 2.0000e-06, 2.0000e-06, -2.0000e-06, 2.0000e-06,
2.0000e-06, 2.0000e-06, 2.0000e-06, 2.0000e-06, -2.0000e-06,
-2.0000e-06, -2.0000e-06, 2.0000e-06, 2.0000e-06, -2.0000e-06,
2.0000e-06, -2.0000e-06, -2.0000e-06, -2.0000e-06, -2.0000e-06,
-2.0000e-06, 2.0000e-06, 2.0000e-06, 2.0000e-06, -2.0000e-06,
-2.0000e-06, -2.0000e-06, -2.0000e-06, -2.0000e-06, 2.0000e-06,
-2.0000e-06, 2.0000e-06, -2.0000e-06, 2.0000e-06, 2.0000e-06]])