Question about PyTorch checkpoints results in Testing


I have one theoretical question, In research papers, mostly we mention testing results, accuracy, precision, recall etc. But mostly In PyTorch training model generates multiple checkpoints, some peoples generate checkpoint after each 1 epoch, some people generate checkpoints after 5 epochs. Something like that.

My question is when I run model testing phase code each checkpoint pass through loop and deliver its own result. Right?

Now, what is criteria to mention testing result in research papers? Suppose i have 10 different checkpoints and each checkpoint has generated 10 different testing results. So I can select highest accuracy checkpoints as best model (with weights) ?

Please help in this regard !

You give best results obviously :stuck_out_tongue:

There is no predefined number of epochs you should run…so if your model gives best results at 2nd epoch then that’s it !

Well, technically you dont check testing results for each checkpoint…you create a validation set and check loss ONLY on the validation set. On basis of validation loss, you select the checkpoint with the lowest loss–and run it on the testing set–what you get is what you mention.

Thanks very much, and how we select results in 5 or 10-Fold cross validation?

You can summarize the results of a k-fold cross-validation run with the mean of the model scores. If you want to, you can also provide a measure of the variance of the scores for your experiments–maybe standard deviation?