During training even when I am fixing the learning rate to 0 still somehow the parameters get updated and I got different test results in different epoch. How’s that possible? My understanding was if the lr =0 then the parameters update will be zero.
Any help is really appreciated.

If you leave your model in training mode (not calling model.eval() or calling model.train() explicitly), the running estimates in your batchnorm layers in your model will be updated.
Even without optimizing parameters, your validation outputs will change after a forward pass during training.

for e in range(num_epochs):
e_losses.append(train_epoch(X_train,Y_train,net, opt, criterion,4) )
net.eval()
x_t = Variable(X[id_cv,:])
Y_t = Y[id_cv,:]
with torch.no_grad():
y_t = net(x_t)
y_ev = [x>0.5 for x in y_t]
yy = y_ev
acc.append(accuracy_score(np.array(yy).astype(int),np.array(Y_t.squeeze().tolist())))
if e%100 == 0 and e!=0:
plt.plot(acc)
plt.show()

Here’s my code where I am calling model.eval() before validation. But, still the validation results are different for different epochs.

Using your code, I get exact the same results, if I call model.eval() before the loop:

x = torch.randn(10, 246)
target = torch.randint(0, 2, (10, 1)).float()
criterion = nn.BCELoss()
model = MyModel()
model.eval()
outputs = []
losses = []
with torch.no_grad():
for _ in range(10):
output = model(x)
loss = criterion(output, target)
outputs.append(output)
losses.append(loss)
outputs = torch.stack(outputs)
print((outputs[0] == outputs).all())
> tensor(1, dtype=torch.uint8)
losses = torch.stack(losses)
print((losses[0] == losses).all())
> tensor(1, dtype=torch.uint8)

Note that the results will differ, if you comment model.eval().

In your code snippet you are using train_epoch, which seems to train the model?
If so, you will surely get different results for your evaluation, since the parameters changed.
Let me know, if I misunderstood something.

Thanks for the reply. Yes, train_epoch() is training the model but since I set lr=0, I thought that the validation results will be the same. But, it seems like the batch norm is changing the parameters even though the learning rate is 0. Is there any way to prevent it?