Image classification Input Features

When classifying images with cnn what the input feature should be? As i know, it should be 1 if the images are grayscale and 3 if the images are RGB. Is it necessary to input feature to be
image pixels * image pixels?

In general convolutions works on a grid. Therefore the input can be an image of any size.
Some networks (usually old ones) have fully connected layers in the edn which work with fixed size inputs. In short networks should work with any size but may not.

Another thing is the subjective size of the object in the image. If a network is trained with images of size X and you feed a image of size 10X then it would be too big. Imagine someone brings an object really really close to your face, you may not recognize it.