I would suggest, using the last one. Did you save the best model? Sometimes, it may be due to the particular data samples that provided the best result.
I saved all models because I was unsure about this certian point…
The thing is the validation is closer to the test set than the training set because the model does not learn on it… So wouldn’t I choose the best validation performance to achieve better test accuracy?
If you care about test set performance, you should go with the best validation loss model, because at this point the model generalizes to unseen samples best. If you simply care to show that your model can (over-)fit any dataset, you should go with the lowest training loss model. But presenting it’s performance will simply show that your model (with enough parameters) can find some local optimum that fits the training set well. However, there is no guarantee then that the model “understands” the true underlying structure of the data.