Is there any research or mature model for predicting a number from an image?

CNN as far as I know is suitable to recognize the patterns in image and connect them with a label. How about recognize the patterns and connect them with a number? For example, an image of porous media crosssection view, with a look, the model can estimate the resistance when a certain fluid flow through it.

Is CNN suitable for this work? Or some version of modified CNN suitable for this work? Is there a well-known public dataset to test this kind of model, like MNIST and CIFAR10?


Hi David!

Yes, using a neural network to predict a number, rather than a class label,
is perfectly reasonable.

This kind of problem is sometimes called “regression,” in contrast to

The notion is that you are predicting a continuous number that can be
closer (better) to or further away (worse) from the ground-truth value
for that number.

The upstream “image-processing” part of the model might be the same
for a classification and a regression problem, with only the last layer (or
the last few layers) differing. If you are predicting a single number, your
last layer might be a Linear with out_features = 1. You would then
typically use MSELoss as the loss function that compares your predicted
value with your ground-truth value.

I don’t know offhand about any specific research about this, but I’m sure
a lot is out there, as this is a common use case.


K. Frank

Thanks, Frank!

How about the activation functions in those middle layers? If I use ReLU, I am afraid the many information will get lost. But if I don’t use ReLU, I am afraid the picture’s texture (or call it “context” or “pattern”) will not be find out. And make the deep learning just like a full connection regression.

Hi David!

The choice of activation functions – and more generally, the choice of
network architecture – will depend on the details of your use case, and
the preferred choices are determined, in practice, by experimentation
(or prior experience), rather than theoretical considerations.

For example, consider the “regression” of predicting a person’s body-mass
index from an image. You could well imagine that a network architecture
(and perhaps even a pre-trained network) for classifying people vs. cats
might do very well at recognizing the relevant “people features,” with only
the output-the-BMI layer being different.

A network used for semantic segmentation of vegetation vs. dirt in satellite
images – seemingly a texture-centric task – might well be a better fit for
your use case.

The point being that the nature of the images and the relevant features
are more important for the choice of network architecture (and activation
functions) than is whether you are performing regression or classification.


K. Frank