Anchor box size meaning in AnchorGenerator

This is a question about how anchor boxes are implemented in pytorch, as I am new to it. I have read this code, along with a lot of other stuff in the torch repo:

Is the “sizes” argument to AnchorGenerator with respect to the original image size, or with respect to the feature map being output from the backbone?

To be more clear and simplify, let’s say I’m only ever interested in detecting objects that are 32x32 pixels in my input images. So my anchor box aspect ratio will definitely be 1.0 as height=width. But, is the size that I put into AnchorGenerator 32? Or do I need to do some math using the backbone (e.g. I have 2 2x2 max pooling layers with stride 2, so the size that I give AnchorGenerator should be 32/(2^2) = 8)?

@fmassa if you’ve got the time, I’d appreciate a comment on this

Hi @millivolt9 ,

I faced the same question and did a lot of reading around to understand it finally. The anchor box sizes are with respect to the original image size and not the feature maps. For every pixel in the feature map, the corresponding centered pixel in the input image will have multiple anchor boxes around it.

sizes array and aspect_ratios arrays need to be of same length (i.e number of feature maps - if you use FPN, then by default it is 5 Feature Maps, if you don’t use FPN, then you have only 1 Feature Map)

But within one feature map, i.e the sizes[i] and aspect_ratios[i] can be anything and different from another feature map and total number of anchor boxes per feature map = len(sizes[I]) * len(aspect_ratios[I]).

For a more detailed explanation, refer to the Class description of AnchorGenerator - vision/ at a7e4fbdc925a5968988ccadd6dffe7abe274dcdc · pytorch/vision · GitHub

Hope this helps.