Object detection and counting but with labels that are images themselves

Hello everyone,

I have a set of main images and an image per class/label.

The main images contain a number of objects that belong to one of the fixed number of classes. I want a class wise count of the number of objects in the main image. Now, the thing is the main images are not annotated with pixel wise location for the objects. Instead I have smaller images for each of the classes. One each for a class.

For example, think of the main image as a fruit basket. The labels are oranges, apples and grapes. Alongside the main images, I have smaller, particular images of an orange, apple and grape. My task to return the number of oranges, apples and grapes given an image of fruit basket.

How would you go about to do it?