How can test result change even when the learning rate is 0?

Here is my simple classification model:

 def __init__(self):
     super().__init__()
     self.fc1 = nn.Linear(246, 512)
     self.bn1 = nn.BatchNorm1d(512)
     self.relu1 = nn.ReLU()
     self.dout = nn.Dropout(0.5)
     self.fc2 = nn.Linear(512, 512)
     self.bn2 = nn.BatchNorm1d(512)
     self.prelu1 = nn.ReLU()
     self.dout1 = nn.Dropout(0.5)
     self.fc3 = nn.Linear(512, 512)
     self.bn3 = nn.BatchNorm1d(512)
     self.prelu2 = nn.ReLU()
     self.dout2 = nn.Dropout(0.5)
     self.fc4 = nn.Linear(512, 512)
     self.bn4 = nn.BatchNorm1d(512)
     self.prelu3 = nn.ReLU()
     self.dout3 = nn.Dropout(0.5)
     self.fc5 = nn.Linear(512, 512)
     self.bn5 = nn.BatchNorm1d(512)
     self.prelu4 = nn.ReLU()
     self.out = nn.Linear(512,1)
     self.out_act = nn.Sigmoid()
     
 def forward(self, input_):
     a1 = self.fc1(input_)
     h1 = self.relu1(self.bn1(a1))
     dout = self.dout(h1)
     a2 = self.fc2(dout)
     h2 = self.prelu1(self.bn2(a2))
     dout1 = self.dout1(h2)
     a3 = self.fc3(dout1)
     h3 = self.prelu2(self.bn3(a3))
     dout2 = self.dout2(h3)
     a4 = self.fc4(dout2)
     h4 = self.prelu3(self.bn4(a4))
     dout3 = self.dout3(h4)
     a5 = self.fc5(dout3)
     h5 = self.prelu4(self.bn5(a5))
     a6 = self.out(h5)
     y = self.out_act(a6)
     return y

During training even when I am fixing the learning rate to 0 still somehow the parameters get updated and I got different test results in different epoch. How’s that possible? My understanding was if the lr =0 then the parameters update will be zero.
Any help is really appreciated.

1 Like

If you leave your model in training mode (not calling model.eval() or calling model.train() explicitly), the running estimates in your batchnorm layers in your model will be updated.
Even without optimizing parameters, your validation outputs will change after a forward pass during training.

1 Like
for e in range(num_epochs):
        e_losses.append(train_epoch(X_train,Y_train,net, opt, criterion,4) )
        net.eval()
        x_t = Variable(X[id_cv,:])
        Y_t = Y[id_cv,:]
        with torch.no_grad():
            y_t = net(x_t)
            y_ev = [x>0.5 for x in y_t]        
            yy = y_ev
            acc.append(accuracy_score(np.array(yy).astype(int),np.array(Y_t.squeeze().tolist())))
        if e%100 == 0 and e!=0:
                plt.plot(acc)
                plt.show()

Here’s my code where I am calling model.eval() before validation. But, still the validation results are different for different epochs.

Using your code, I get exact the same results, if I call model.eval() before the loop:

x = torch.randn(10, 246)
target = torch.randint(0, 2, (10, 1)).float()
criterion = nn.BCELoss()
model = MyModel()
model.eval()


outputs = []
losses = []
with torch.no_grad():
    for _ in range(10):
        output = model(x)
        loss = criterion(output, target)
        outputs.append(output)
        losses.append(loss)

outputs = torch.stack(outputs)
print((outputs[0] == outputs).all())
> tensor(1, dtype=torch.uint8)

losses = torch.stack(losses)
print((losses[0] == losses).all())
> tensor(1, dtype=torch.uint8)

Note that the results will differ, if you comment model.eval().

In your code snippet you are using train_epoch, which seems to train the model?
If so, you will surely get different results for your evaluation, since the parameters changed.
Let me know, if I misunderstood something.

Thanks for the reply. Yes, train_epoch() is training the model but since I set lr=0, I thought that the validation results will be the same. But, it seems like the batch norm is changing the parameters even though the learning rate is 0. Is there any way to prevent it?

Since your model is probably still in .train() model, the running stats will be updated:

Try to call model.eval() before the training step.

How comes that when lr=0 the train() changes the weights? I am running a training with lr=0 (with SGD) on 1 training example, and the loss changes every iteration. How come?

Your model might still use dropout layers which will also change the loss but will of course not update the parameters.
An optimizer without momentum or any internal stats should not update any parameters with a learning rate of 0 so could you post a minimal and executable code snippet to reproduce the issue, please?

You are absolutely right! Thanks! Once I scanned all the dropout layers and set their p=0 before the training, I could see that indeed learning rate of 0 results no change in loss over iterations.