Problem with my best parameter save algorithm

Hi, I have a structure below:

if valid_loss <= valid_loss_min:
                a,b,c,d = self.test(testloader)
       = copy.deepcopy(self.state_dict())

                valid_loss_min = valid_loss

this structure is in my training loop. And I have a test function. When I call self.test in the code above, I expect the highest a,b,c,d (presicion recall f1 accuracy) values I get to be the same as the outputs of the test function I called after the training loop is over. But the values are different. Where is my mistake?

(I do self.load_state_dict( after my training loop is finished.)

Are you calling model.eval() while computing the test stats?
Also, are you able to get the same values outside of this valid_loss condition just by repeatedly calling into the self.test method?

Yes, I call model.eval() while computing test. Also I am not able to get the same values outside of this valid_loss condition just by repeatedly calling into the self.test method. When I turn off shuffle for dataloader I get same result.

Here is the computation of a,b,c,d →

def test(self,testloader):
        test_score_f1 = 0
        test_score_accuracy = 0
        test_score_recall = 0
        test_score_presicion = 0
        for data, target in testloader:
            with torch.no_grad():
                output = self(data)
            _,output = torch.max(output,1)
            test_score_f1 = test_score_f1 + self.f1(output,target)
            test_score_accuracy = test_score_accuracy + self.accuracy(output,target)
            test_score_recall = test_score_recall + self.recall(output,target)
            test_score_presicion = test_score_presicion + self.presicion(output,target)
        test_score_f1 = data.shape[0]*test_score_f1/len(test_dataset)
        test_score_accuracy = data.shape[0]*test_score_accuracy/len(test_dataset)
        test_score_recall = data.shape[0]*test_score_recall/len(test_dataset)
        test_score_presicion = data.shape[0]*test_score_presicion/len(test_dataset)
        return test_score_f1,test_score_accuracy,test_score_recall,test_score_presicion

This would indicate a dependency on the actual order or batches, which shouldn’t be the case if model.eval() is working properly. I would recommend to try to narrow down which operation is creating different outputs depending on the order of samples.

1 Like

Thanks, I could not understand why model.eval() is not working properly. I was thinking that Norm layer causes the point you mention beacuse there is no other “thing” that could effect but when I turn of norm layer I cross with same problem.