this structure is in my training loop. And I have a test function. When I call self.test in the code above, I expect the highest a,b,c,d (presicion recall f1 accuracy) values I get to be the same as the outputs of the test function I called after the training loop is over. But the values are different. Where is my mistake?
(I do self.load_state_dict(self.best) after my training loop is finished.)
Are you calling model.eval() while computing the test stats?
Also, are you able to get the same values outside of this valid_loss condition just by repeatedly calling into the self.test method?
Yes, I call model.eval() while computing test. Also I am not able to get the same values outside of this valid_loss condition just by repeatedly calling into the self.test method. When I turn off shuffle for dataloader I get same result.
This would indicate a dependency on the actual order or batches, which shouldn’t be the case if model.eval() is working properly. I would recommend to try to narrow down which operation is creating different outputs depending on the order of samples.
Thanks, I could not understand why model.eval() is not working properly. I was thinking that Norm layer causes the point you mention beacuse there is no other “thing” that could effect but when I turn of norm layer I cross with same problem.