What affects quality/accuracy of image classification results?

Forelli · September 14, 2020, 8:18am

Hi

For a further education I have analyzed PyTorch for image classification (with Kaggle Dogs vs. Cats). In particular I investigated what influences the quality/accuracy of the results.

I claim that the following points are most important (sorted by importance):

amount of data (I recommend at least 20’000 images for categorizations with two types, DogsVsCats example have 25’000)
data quality
division Train/Test-Data (ideal results at 90%/10%)
model used
script modifications (e.g. optimizer, transforms)

The following properties have a small influence on the accuracy:

Size of the images
Alignment of the images

Furthermore I claim that the hardware used (CPU/CUDA) does not influence the quality of the results, but clearly the duration.

One more question for long-time users:Have new versions of PyTorch significantly improved the quality/accuracy or is that just model related? Unfortunately I cannot test this.

Would you accept this as a professor or have I forgotten important points?

Cheers
Forelli