How can test result change even when the learning rate is 0?

sayangsep · June 23, 2019, 12:08am

Here is my simple classification model:

 def __init__(self):
     super().__init__()
     self.fc1 = nn.Linear(246, 512)
     self.bn1 = nn.BatchNorm1d(512)
     self.relu1 = nn.ReLU()
     self.dout = nn.Dropout(0.5)
     self.fc2 = nn.Linear(512, 512)
     self.bn2 = nn.BatchNorm1d(512)
     self.prelu1 = nn.ReLU()
     self.dout1 = nn.Dropout(0.5)
     self.fc3 = nn.Linear(512, 512)
     self.bn3 = nn.BatchNorm1d(512)
     self.prelu2 = nn.ReLU()
     self.dout2 = nn.Dropout(0.5)
     self.fc4 = nn.Linear(512, 512)
     self.bn4 = nn.BatchNorm1d(512)
     self.prelu3 = nn.ReLU()
     self.dout3 = nn.Dropout(0.5)
     self.fc5 = nn.Linear(512, 512)
     self.bn5 = nn.BatchNorm1d(512)
     self.prelu4 = nn.ReLU()
     self.out = nn.Linear(512,1)
     self.out_act = nn.Sigmoid()
     
 def forward(self, input_):
     a1 = self.fc1(input_)
     h1 = self.relu1(self.bn1(a1))
     dout = self.dout(h1)
     a2 = self.fc2(dout)
     h2 = self.prelu1(self.bn2(a2))
     dout1 = self.dout1(h2)
     a3 = self.fc3(dout1)
     h3 = self.prelu2(self.bn3(a3))
     dout2 = self.dout2(h3)
     a4 = self.fc4(dout2)
     h4 = self.prelu3(self.bn4(a4))
     dout3 = self.dout3(h4)
     a5 = self.fc5(dout3)
     h5 = self.prelu4(self.bn5(a5))
     a6 = self.out(h5)
     y = self.out_act(a6)
     return y

During training even when I am fixing the learning rate to 0 still somehow the parameters get updated and I got different test results in different epoch. How’s that possible? My understanding was if the lr =0 then the parameters update will be zero.
Any help is really appreciated.

ptrblck · June 23, 2019, 1:17am

If you leave your model in training mode (not calling model.eval() or calling model.train() explicitly), the running estimates in your batchnorm layers in your model will be updated.
Even without optimizing parameters, your validation outputs will change after a forward pass during training.

sayangsep · June 23, 2019, 1:35am

for e in range(num_epochs):
        e_losses.append(train_epoch(X_train,Y_train,net, opt, criterion,4) )
        net.eval()
        x_t = Variable(X[id_cv,:])
        Y_t = Y[id_cv,:]
        with torch.no_grad():
            y_t = net(x_t)
            y_ev = [x>0.5 for x in y_t]        
            yy = y_ev
            acc.append(accuracy_score(np.array(yy).astype(int),np.array(Y_t.squeeze().tolist())))
        if e%100 == 0 and e!=0:
                plt.plot(acc)
                plt.show()

Here’s my code where I am calling model.eval() before validation. But, still the validation results are different for different epochs.

ptrblck · June 23, 2019, 11:13am

Using your code, I get exact the same results, if I call model.eval() before the loop:

x = torch.randn(10, 246)
target = torch.randint(0, 2, (10, 1)).float()
criterion = nn.BCELoss()
model = MyModel()
model.eval()


outputs = []
losses = []
with torch.no_grad():
    for _ in range(10):
        output = model(x)
        loss = criterion(output, target)
        outputs.append(output)
        losses.append(loss)

outputs = torch.stack(outputs)
print((outputs[0] == outputs).all())
> tensor(1, dtype=torch.uint8)

losses = torch.stack(losses)
print((losses[0] == losses).all())
> tensor(1, dtype=torch.uint8)

Note that the results will differ, if you comment model.eval().

In your code snippet you are using train_epoch, which seems to train the model?
If so, you will surely get different results for your evaluation, since the parameters changed.
Let me know, if I misunderstood something.

sayangsep · June 23, 2019, 5:59pm

Thanks for the reply. Yes, train_epoch() is training the model but since I set lr=0, I thought that the validation results will be the same. But, it seems like the batch norm is changing the parameters even though the learning rate is 0. Is there any way to prevent it?

ptrblck · June 23, 2019, 6:01pm

Since your model is probably still in .train() model, the running stats will be updated:

Try to call model.eval() before the training step.

ndvbd · April 13, 2023, 12:38pm

How comes that when lr=0 the train() changes the weights? I am running a training with lr=0 (with SGD) on 1 training example, and the loss changes every iteration. How come?

ptrblck · April 13, 2023, 4:26pm

Your model might still use dropout layers which will also change the loss but will of course not update the parameters.
An optimizer without momentum or any internal stats should not update any parameters with a learning rate of 0 so could you post a minimal and executable code snippet to reproduce the issue, please?

ndvbd · April 14, 2023, 7:08am

You are absolutely right! Thanks! Once I scanned all the dropout layers and set their p=0 before the training, I could see that indeed learning rate of 0 results no change in loss over iterations.