Sorry i’m a little confused.
Do you mean a classifier model by that or what?
I have faster-rcnn model and in training forward pass I pass image and all boxes for all classes which are presented in the image as ground truth.
Can you expalin please which point exactly are you chasing using a single crop as input?