I have modified the fc
layer in a resnet18 model to output 4 logits in a classification problem. To test my model am using the following function.
def test_model(model, test_loader, train=False):
with torch.no_grad():
_ = model.train() if train else model.eval()
labs_list, prds_list, test_corrects = [], [], 0
for i,(inps,labs) in enumerate(test_loader):
inps, labs = inps.to(DEVICE), labs.to(DEVICE)
outs = model(inps)
prds = outs.argmax(dim=1)
test_corrects += (prds==labs).sum().item()
print(f'model.training = {str(model.training):>5}, ' \
+ f'test_acc = {test_corrects/len(test_loader.dataset):0.6f}')
The train
argument in the function decides if model.train()
is used or model.eval()
. It is expected that model.eval()
should always produce the same value of accuracy. However, if I run the following piece of code:
for i in range(10):
test_model(net, test_loader)
test_model(net, test_loader, train=True)
i.e. running the train
and eval
alternatively, accuracy in train model remains the same every time, but accuracy in eval
mode gradually increases until it reaches a value, after which there is no further increase. This gradual saturation of accuracy is not observed if the train
mode is removed from the for
loop. This suggests that changes in the model after each train
mode, affects the eval
mode as well, which is really perplexing.
I suspect it might be wrong for the test_model
function to call train
mode inside torch.no_grad()
. But I am still not sure how it affects the model performance in the eval
mode.