I am attempting to extract the pixel location of an object (a row of plants in my application) from an image. From this related topic I have gathered that CNN’s are not designed to provide locality information about objects in an image, but rather to detect an object anywhere in an image by constructing a translation-invariant model.
Is there any way to tease this positional information out of a CNN, or is a fully-connected model the route to take? Could a CNN be trained to recognize an image with a vertical row of plants at, say column #16, as an entirely different object than an image with a row of plants at column #244? In this case I envision having a discrete number of possible classes that would be equal to the number of columns in the image, and the network would return 16 for the image with the row at column #16.