How to choose optimal number of epochs for training deep learning model using early stopping?

mr_nlp · August 28, 2020, 1:48pm

I have trained a model for classification using pytorch for 15 epochs and I got the following results.

Epoch: 01 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.8484417188167572	46.27649937455544	0.44242000422979655	81.8126314762357
Epoch: 02 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.3785825753211975	80.93352949205939	0.3330468206029189	86.85274661582442
Epoch: 03 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.2687795569499334	86.35365793971523	0.2897682840886869	84.34118941644529
Epoch: 04 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.2083733034133911	89.25453795570579	0.29871052660440145	87.85576875537966
Epoch: 05 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.1793482033411662	90.62846731090985	0.3072881722136548	89.1554929532116
Epoch: 06 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.1568595383564631	91.87457575525828	0.28149403474832835	85.58977772279461
Epoch: 07 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.131117541094621	93.26272659692468	0.2886749090332734	85.51317477530897
Epoch: 08 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.1124691803753376	93.9979780923722	0.2921934198392065	85.9360970828858
Epoch: 09 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.10579593896865845	94.22608917566883	0.3030163010484294	89.53196601823727
Epoch: 10 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.09209968343377113	94.66193046659139	0.3365556134989387	90.01089741152205
Epoch: 11 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.07625301644206046	95.64829359095773	0.33332310224834244	89.54016744056294
Epoch: 12 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.07318100467324257	95.77540255609102	0.37028060304491145	89.94578304043591
Epoch: 13 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.0694477858642737	95.8988953895531	0.3802690231486371	89.90261120713019
Epoch: 14 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.0634955983608961	96.21871614432091	0.37298278353716197	88.09427803394631
Epoch: 15 | Epoch Time: 0m 37s
*******train_loss,train_acc,valid_loss,valid_ac********** 
0.06239921828111013	96.27122675510468	0.35263362683747945	85.75150055866662

I want to apply early stopping. I would like to know which one is more appropriate and why?

a) early stopping on the basis of validation loss

or

b) early stopping on the basis of validation accuracy

@ptrblck

tom · August 28, 2020, 5:10pm

I’m not @ptrblck but if you’re willing to also hear form someone else:
If you think of this as an optimization problem, you might think about what metric you want to optimize. I guess the more common criterion is the accuracy (which we cannot use in training because it is not differentiable).
Note that you would need a test set to measure the resulting accuracy is (because using early stopping means you now used the validation set in training).

mr_nlp · August 29, 2020, 1:30am

Thanks for the information. Can you please explain it in a more clear way? Are there any drawbacks if I apply early stopping on the basis of validation accuracy? @tom