Localization Information from a Convolutional Neural Network

collinbrake · June 17, 2020, 2:33pm

I am attempting to extract the pixel location of an object (a row of plants in my application) from an image. From this related topic I have gathered that CNN’s are not designed to provide locality information about objects in an image, but rather to detect an object anywhere in an image by constructing a translation-invariant model.

Is there any way to tease this positional information out of a CNN, or is a fully-connected model the route to take? Could a CNN be trained to recognize an image with a vertical row of plants at, say column #16, as an entirely different object than an image with a row of plants at column #244? In this case I envision having a discrete number of possible classes that would be equal to the number of columns in the image, and the network would return 16 for the image with the row at column #16.

ptrblck · June 18, 2020, 8:22am

This sounds rather like a segmentation use case, where each pixel will get a class prediction.

torchvision provides a few segmentation models, which you could try out for your use case.
Let me know, if I misunderstood your question.

collinbrake · June 18, 2020, 1:55pm

Thanks ptrblck! Segmentation looks like the right method for my use case.