Thanks for your answer. I need to develop intuition in this area yet. In Computational Chemistry we use also GD, SGD, and the sensitivity to initial conditions increases with the number of atoms, due to the presence of more degrees of freedom. So, my line of thinking was that the binary classification of cat vs non cat , would be less affected than a ten classes classification as CIFAR-10. But I see your point, and the role of the amount of data.