Model parameters not updating during training

karnation22 · December 4, 2018, 2:19am

Hello,

I’m currently a student doing offline character recognition, but there is an issue with the model.parameters() not updating during training. Here is a copy of my train_function and forward. I used F.log_softmax() for the forward, and nll_loss() for the loss function. All the intermediate tensors (output, input, loss, etc) appear to be correct. However, when computing “post - pre”, there is no update to the model parameters, which is very confusing.

print("shape coming in is "+str(x.shape))
x = F.max_pool2d(F.relu(self.conv1(x)), self.kernel)
print("shape after round 1 is "+ str(x.shape))
x = F.max_pool2d(F.relu(self.conv2(x)), self.kernel)
print("shape after round 2 is "+str(x.shape))
x = F.max_pool2d(F.relu(self.conv3(x)), self.kernel)
print("shape after round 3 is "+str(x.shape))
x = x.view(-1, self.flatten_features(x))
print("shape after round 4 view is "+str(x.shape))
x = F.relu(self.fc1(x))
print("shape after round 5 linear 1 is "+str(x.shape))
x = self.fc2(x)
print("shape after round 6 linear 2 is "+str(x.shape))
return F.log_softmax(x)

optimizer.zero_grad()
print(“OUTPUT: {}{}".format(output, output.shape))
loss = F.nll_loss(output, chin_char)
print("LOSS: {}{}”.format(loss,loss.shape))
pre = list(model.parameters())[0]
loss.backward()
optimizer.step()
post = list(model.parameters())[0]

vmirly1 · December 4, 2018, 2:40am

Can you provide the way you defined the optimizer? Maybe the parameters are not passed to the optimizer correctly. Also, what is the learning rate?

karnation22 · December 4, 2018, 2:42am

optimizer = optim.SGD(model.parameters(),lr=args.l_rate,momentum=args.momentum)

Learning rate is 0.01.

vmirly1 · December 4, 2018, 2:58am

It actually chanegs, but pre and post are the same because it’s not creating a new copy of the tensor values. Try printing them before and after the optimizer.step() line.

vmirly1 · December 4, 2018, 3:02am

I have a simple example for this:

>>> model = nn.Linear(4, 2)
>>> x = torch.randn(10, 4)
>>> y = torch.LongTensor([1, 0, 1, 1, 0, 1, 0, 1, 1, 1])
>>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
>>> # compute the output
>>> out = F.log_softmax(model(x), dim=1)
>>> pre = list(model.parameters())[0]
>>> # print the parameters before update:
>>> print(pre)
Parameter containing:
tensor([[ 0.4893, -0.4020,  0.3605, -0.3534],
        [-0.3695,  0.3736,  0.1374,  0.3506]], requires_grad=True)

Now, we do the update step:

>>> optimizer.zero_grad()
>>> loss = F.nll_loss(out, y)
>>> loss.backward()
>>> optimizer.step()
>>> post = list(model.parameters())[0]
>>> print(pre)
Parameter containing:
tensor([[ 0.4812, -0.3906,  0.3314, -0.3524],
        [-0.3614,  0.3621,  0.1665,  0.3496]], requires_grad=True)
>>> print(post)
Parameter containing:
tensor([[ 0.4812, -0.3906,  0.3314, -0.3524],
        [-0.3614,  0.3621,  0.1665,  0.3496]], requires_grad=True)

So you can see that pre and post have the same values after the update step is performed, but the printed values of paramaters before the update are different.