Is the suitable model complexity more decided by the data volume or the size of input/output data?

For example, I would like to find a suitable model to get a value from an image.
Does the mode complexity is decided more by how many data entries are there? or How large the image is (width, height, channel), and how complex the value is (a single value, or an array of values)?

Is it possible that the model can be well trained based on 100,000 data entries, but cannot be well trained on 1,000,000 data entries?

Based on your experience, what is your opinion? Thanks.