Regarding use of image in text experiment

Why do we carry experiments with attempting at making computer do tasks like text classification without showing any image to it?
Isn’t it like learning a new language, and all you would be shown is words in that new language, so you would end up in a state of not knowing the meaning of any word as all of them are in a language you do not know, same thing applies for computer also.
Shouldn’t all text classification experiments also have image input, just like we learn first language by being show text and associated image?