Based on your answer, my init idea is turn the batched annotations into batched indexes, and then batch indexed the feature map to get cropped features. Is that right understanding?
BTW I saw this:
I think we have the same purpose. I try the code in that reply, however, seems like not a exactly cropping?